Too much data, a way to combine outputs?


I’m trying to use Mothur to analyze a timer series with over 100 Miseq runs. I was following the MiSeq SOP but I ran out of memory at unique.seqs because i had over 18,000,000 sequences! I figured it’s probably a good idea to run the SOP pipeline on each of the 100 runs and combine the results. Is mothur capable of doing such thing? Which commands should I be looking at?


That’s … a lot of data.

So a couple of preliminary questions. First, is this V4 data? If not, then you’re never going to get very far with the OTU approach because the error rate will be too high. Second, if it’s time series data, then I would expect a lot of overlap between samples, which should make your life easier.

You can certainly do each run separately until you get a count table. Once you have this, I’d then merge the datasets. You might have to merge the count table on your own. Considering you have 100 MiSeq runs worth of data, I’ll assume you know how to work with large datasets and if you align the data will use a customized alignment file.