Combining samples from a run

campenr · February 13, 2017, 2:55pm

Hi all,

So, my situation is that I am analyzing someone’s data set that I eventually wish to compare with my own samples. The data set contains a group of samples that were size fractionated. During their own analysis they apparently combined the data for all the size fractions for a particular sample resulting in a merged sample. This occurs after all the sequence processing, with the merging occurring after a shared file has been generated. I am trying to recreate their analysis, and what I find is that in creating the merged sample, I end up with a shared file that has a sample with essentially every OTU, and any mothur command I run on this shared file promptly eats up all my memory and crashes the PC if I don’t catch it in time.

First, am I right in believing that the nature of the resulting shared file, with a sample consisting of almost every OTU is the reason why my memory usage is so high, and do you think (for the sake of trying it), it would be worth getting some time on a system with enough memory to crunch this shared file? Or is it likely the memory requirements are simply too high for this to be doable?

Second, are there any thoughts on this merging of samples in a run to create a combined sample? I personally am uncertain, as each of the size fractions may have experienced their own biases, and in merging them we likely get a community profile different to what would have been observed if there had only been one sample in the first place.

Any musings on the above points would be greatly appreciated.

Cheers
Richard

pschloss · February 16, 2017, 8:48pm

If you want to compare your data to theirs, you’ll have to process them all together. Because the formatting is probably wrong in the merged shared file, this is what is causing the software to explode.

Pat

campenr · February 17, 2017, 3:34pm

Hi Pat,

Thanks for replying.

I realized my mistake was in creating the new shared file the numSeqs was getting saved as a float (with the decimal place) which was what was breaking mothur.

Cheers
Richard

Topic		Replies	Views
When to combine data from two batches? Theory behind mothur	6	24	August 11, 2024
Combining sequence datasets Theory behind mothur	5	1443	December 16, 2018
Diversity comparisons between different sized datasets? Theory behind mothur	15	7513	March 18, 2015
Processing samples in subsets Theory behind mothur	7	3617	August 12, 2015
Hot to get OTUs from a shared file? Commands in mothur	1	935	April 27, 2016

Combining samples from a run

Related topics