suggestions for large files for uchime de novo

kmihindu · March 10, 2015, 6:07pm

Hi Pat,
I am using mothur to chimera-screen MiSeq data. My plan is to run both uchime reference and uchime de novo and remove any sequences called chimeric by either method. I am running into a problem when I have samples with > 100,000 unique sequences. Although we are aiming for ~5,000 to 10,000 sequences per sample, we often have a handful of samples that have far more reads than the other samples, and when these samples reach >100,000, the run time for Uchime de novo becomes several days to weeks, depending on how large an individual sample gets. I have tried 8 processors and 20 processors, but without much decrease in time.

Can you give me any advice on how to increase the speed, or do you recommend just using the uchime reference alone instead.

Thank you,
Kathie Mihindukulasuriya

pschloss · March 13, 2015, 12:45pm

Are you using pre.cluster? If so, I suspect the problem is more fundamental to the quality of your data. Even if you were to get through chimera.uchime and emerge with samples that have 100k reads, it will take a long time to get through the clustering process. You might check this out…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

kmihindu · March 13, 2015, 6:45pm

Hi,
Thanks Pat. I am doing the precluster step and only working on unique reads. The data I have so far has not been optimal quality (still working out production bugs).

I worry that even with better data in the future, I may run into large files if the MiSeq behaves like the 454, where you sometimes get a lot more reads for a couple samples and that if those samples happen to have a long tail of real taxa, I may run into the same problem.

Thank you very much for your quick and helpful replies,
Kathie

Topic		Replies	Views
chimera.uchime Commands in mothur	3	1851	February 20, 2015
Mothur for large amount of data Feature requests	7	6778	September 26, 2013
chimera.uchime running for ever Integrating mothur with other programs	1	1594	May 11, 2017
problem with chimera for multiple samples Commands in mothur	5	1917	April 26, 2016
CHimera uchime Commands in mothur	1	1814	July 19, 2013

suggestions for large files for uchime de novo

Related topics