Running the chimera.uchime command on a larger file (~1.2million seqs), using my countfile as a reference. To date, the command has been running >1 month on an 18S rRNA data set. Wondering if anyone has experienced any comparable runtimes or if anyone’s processing rate continuously decreases as the file nears completion (30seqs/sec at start versus 1seqs/7secs at 97% completion).
Whoa, that’s a long time. How many processors did you set? How long is the region in the 18S gene that you’re sequencing? What sequencing platform are you using?
That particular dataset was running on 8 processors, but I’ve experienced similar runtimes using 32.
~450 bps from a 2x250 MiSeq run.
Sure takes a long time, but the data look great.
The problem is likely because your reads do not fully overlap (I know this is hard to design for 18S) and so you have a high error rate, which effectively inflates the number of unique sequences and makes everything take longer and more RAM. See:
Also, just to be clear, you’re giving it a count or group file, right? Any sense how many groups it has processed?
Related to the blog post above, I suspect that even if it will go through chimera.uchime you won’t be able to form OTUs. It will likely be necessary to do a phylotype-based approach.