This is more an interest question than anything else. Due to having to use long (~450bp) sequences my uchime commands tend to take a long time to run (up to a month). Due to a problem with Miseq the sequencing centre I use put my samples on a Hiseq so I now have twice my normal amount of sequences. Uchime is therefore taking a VERY long time to run (>1 month).
Here is the question, I’ve noticed that while the process is still slow at first when I am getting output to the screen like this:
'00:00 18Mb 0.1% Reading 10118_projecttrim.contigs.good.unique.good.filter.un00:00 18Mb 0.1% Reading 10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp
WARNING: Ignoring gaps in FASTA file ‘10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp’
00:00 24Mb 100.0% Reading 10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp
00:00 24Mb 12.5k sequences
37:18 12Mb 100.0% 6188/12472 chimeras found (49.6%)
It took 2239 secs to check 12473 sequences from group TroutRTGEpooDOM.’
The screen output goes through all of the samples in this way and after about a week it stops delivering output to the screen.
However, the longest part of the process seems to be after this point. At this stage I get no new outputs to screen but the process doesn’t end. It just sits there. This part of the process is taking weeks and the sizes of my output files from uchime don’t seem to be changing from day to day over that time, making me feel like it isn’t really doing very much. It must be though as for the process my VIRT is 15.6g, RES is 12g and SHR is 80.
I know from the past that it will probably successfully finish and I don’t have a memory problem. I was just wondering exactly what the command is doing at this point if it has gone through the chimera detection stage with the samples and why it would take so long in comparison?
Thanks for your help,