Uchime processes

Somebodyatthedoor · October 12, 2015, 8:42am

Hi,

This is more an interest question than anything else. Due to having to use long (~450bp) sequences my uchime commands tend to take a long time to run (up to a month). Due to a problem with Miseq the sequencing centre I use put my samples on a Hiseq so I now have twice my normal amount of sequences. Uchime is therefore taking a VERY long time to run (>1 month).

Here is the question, I’ve noticed that while the process is still slow at first when I am getting output to the screen like this:

'00:00 18Mb 0.1% Reading 10118_projecttrim.contigs.good.unique.good.filter.un00:00 18Mb 0.1% Reading 10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp
WARNING: Ignoring gaps in FASTA file ‘10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp’
00:00 24Mb 100.0% Reading 10118_projecttrim.contigs.good.unique.good.filter.unique.precluster.temp22118.temp
00:00 24Mb 12.5k sequences
37:18 12Mb 100.0% 6188/12472 chimeras found (49.6%)

It took 2239 secs to check 12473 sequences from group TroutRTGEpooDOM.’

The screen output goes through all of the samples in this way and after about a week it stops delivering output to the screen.

However, the longest part of the process seems to be after this point. At this stage I get no new outputs to screen but the process doesn’t end. It just sits there. This part of the process is taking weeks and the sizes of my output files from uchime don’t seem to be changing from day to day over that time, making me feel like it isn’t really doing very much. It must be though as for the process my VIRT is 15.6g, RES is 12g and SHR is 80.

I know from the past that it will probably successfully finish and I don’t have a memory problem. I was just wondering exactly what the command is doing at this point if it has gone through the chimera detection stage with the samples and why it would take so long in comparison?

Thanks for your help,

Laura

westcott · October 12, 2015, 2:58pm

Good Question, after the chimera detection process completes mothur looks at the dereplicate parameter. By default this is false, meaning if one sample finds a sequence to be chimeric then all samples should also find it chimeric. This involves parsing the results of the uchime program. I will add an feature request to our list to take a look at ways to improve the speed of this process. Thanks for bringing this to our attention, Sarah.

Somebodyatthedoor · October 13, 2015, 8:25am

I ran the command with dereplicate=t as I am following the Miseq SOP. Doesn’t this mean it wouldn’t perform the parsing step?

Cheers,

Laura

westcott · October 15, 2015, 3:06pm

Yes the parsing step would be skipped. If you are running the command with a count file, mothur will create a new count file with the samples where the sequences were found to be chimeric zeroed out. If any sequences are found to be chimeric in all samples then they are completely removed. Perhaps we can speed up this process as well.

Somebodyatthedoor · October 27, 2015, 9:23am

So, my command was unfortunately accidentally killed by a server problem before it was able to complete this last step and it therefore did not create a count file. However, it had created the accnos file by this point so I ran remove.seqs using the fasta and count files created by the pre.cluster command. This appears to have successfully created a fasta and count file with the chimeras in the accnos file removed.

I was just wanting to check that people felt this was an OK thing to do? It just seems odd that if it was the creation of the count file which was taking so long for the uchime command then why would it take such a short amount of time to create one using the remove.seqs command? They are pretty much doing the same thing aren’t they?

Laura

vebaev · May 13, 2016, 9:16am

Hi,
I have also sequences little more than 400nt, and I’m waiting the Uchime command from 12h… :shock: …is that normal? I have 7 samples (Miseq, 2x300) and I’m following the SOP.

campenr · May 13, 2016, 9:52pm

This is relevant reading for any analysis using 2x300 PE reads on MiSeq.

Topic		Replies	Views
Uchime Commands in mothur	4	2153	October 19, 2015
long processing time chimera.uchime? mothur bugs	4	1275	March 14, 2017
chimera.uchime Commands in mothur	3	1851	February 20, 2015
suggestions for large files for uchime de novo Commands in mothur	2	2079	March 13, 2015
CHimera uchime Commands in mothur	1	1812	July 19, 2013

Uchime processes

Related topics