long processing time chimera.uchime?

Hi, i am running my 18s RNA 21 samples all together with silva.seed_v123 database. I am just getting 23h processing time on one group with chimera.uchime command. Could let me know is it normal?

It’s hard to diagnose problems if we don’t have the exact command you are running. Can you provide more details? Also, you seem to be firing off a bunch of posts to forums. If you could ask a single question about a single problem that would make things much easier for us to follow.

Hi Pat,

Sorry to post lots of questions in the forums. I will manage my queries.

Please have a look into below process command. Pipeline took 29h to find chimeras from a group JCT40_3_S25.

mothur >
chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)

Using 1 processors.

Checking sequences from stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta …

It took 5843 secs to check 16062 sequences from group JCT28_2_S24.
It took 1158 secs to check 6828 sequences from group JCT3_3_S23.
It took 105137 secs to check 75188 sequences from group JCT40_3_S25.

That’s an unfortunate run-time for the group, but I’m not sure it’s a bug. If you compare the times of each samples, you’ll notice there’s a direct correlation between the number of sequences in a sample and it’s run time. I.e., JCT3_3_S23 has ~7,000 sequences and took ~1,100 seconds to run. JCT28_2_S24 has about twice as many sequences, and took ~5,800 seconds. Your ‘problem’ sample has about 5 times as many sequences as JCT28_2_S24 so obviously will take longer to run.

I don’t know the exact growth rate of the uchime algorithm, but since it involves comparing all of your rare sequences to all the abundant sequences I would expect it to grow roughly exponentially as samples get bigger.

I agree with dwaite, thats pretty normal for checking tens of thousands of seqs/sample. I’ve considered subsampling the samples that win the pooling lottery and get an order of magnitude more seqs than i need