Combining samples from a run

Hi all,

So, my situation is that I am analyzing someone’s data set that I eventually wish to compare with my own samples. The data set contains a group of samples that were size fractionated. During their own analysis they apparently combined the data for all the size fractions for a particular sample resulting in a merged sample. This occurs after all the sequence processing, with the merging occurring after a shared file has been generated. I am trying to recreate their analysis, and what I find is that in creating the merged sample, I end up with a shared file that has a sample with essentially every OTU, and any mothur command I run on this shared file promptly eats up all my memory and crashes the PC if I don’t catch it in time.

First, am I right in believing that the nature of the resulting shared file, with a sample consisting of almost every OTU is the reason why my memory usage is so high, and do you think (for the sake of trying it), it would be worth getting some time on a system with enough memory to crunch this shared file? Or is it likely the memory requirements are simply too high for this to be doable?

Second, are there any thoughts on this merging of samples in a run to create a combined sample? I personally am uncertain, as each of the size fractions may have experienced their own biases, and in merging them we likely get a community profile different to what would have been observed if there had only been one sample in the first place.

Any musings on the above points would be greatly appreciated.


If you want to compare your data to theirs, you’ll have to process them all together. Because the formatting is probably wrong in the merged shared file, this is what is causing the software to explode.


Hi Pat,

Thanks for replying.

I realized my mistake was in creating the new shared file the numSeqs was getting saved as a float (with the decimal place) which was what was breaking mothur.