Too much data, a way to combine outputs?

alvinx · December 10, 2013, 8:54pm

Hi,

I’m trying to use Mothur to analyze a timer series with over 100 Miseq runs. I was following the MiSeq SOP but I ran out of memory at unique.seqs because i had over 18,000,000 sequences! I figured it’s probably a good idea to run the SOP pipeline on each of the 100 runs and combine the results. Is mothur capable of doing such thing? Which commands should I be looking at?

Thanks,
Alvin

pschloss · December 11, 2013, 8:59pm

That’s … a lot of data.

So a couple of preliminary questions. First, is this V4 data? If not, then you’re never going to get very far with the OTU approach because the error rate will be too high. Second, if it’s time series data, then I would expect a lot of overlap between samples, which should make your life easier.

You can certainly do each run separately until you get a count table. Once you have this, I’d then merge the datasets. You might have to merge the count table on your own. Considering you have 100 MiSeq runs worth of data, I’ll assume you know how to work with large datasets and if you align the data will use a customized alignment file.

Topic		Replies	Views
Combining taxonomy table fro different dataset Commands in mothur	6	1105	January 31, 2017
Correspondance between OTU numbers between runs Commands in mothur	7	253	September 23, 2022
Dist.seq output too big Commands in mothur	3	2917	January 20, 2014
Are my number of sequences and OTU weird? Theory behind mothur	11	2265	November 21, 2016
Problem with OTU classification mothur bugs	5	5560	April 19, 2010

Too much data, a way to combine outputs?

Related Topics