shh.flows with 18S rRNA takes weeks

We have already scoured the forum​ but we could not find something similar…

We are dealing with 454 pyrosequencing analysis of 18S rRNA genes and investigate the diversity of eukaryotic protists (mainly phytpolankton) in the Arctic Ocean using the V4 region (~600bp fragments).

To denoise different samples (size of ~30 000 - 80 000 sequences), we tried to run shhh.flows in mothur (Version 1.22.2) with 8 processors for each sample. However, the denoising took more than 1 month! So, we canceld this. The same holds true for mothur v 1.29.2. And we also tried to use only one processor.

As an alternative way, we used the large option to split the dataset (large=1000). This job was done in some minutes. However, it would be great if we could find a way to denoise the whole dataset. Is there any way to denoise large 18S datasets without splitting?

Thanks for this forum!

Kristin

Krisitin-

Are you runnging trim.flows with the default minflows and maxflows and are you doing the analysis one sample at a time (using the file= option)?

Pat

Hi Pat,
Thanks for your answer! Now, it works much faster!
Before, I used trim.flow without giving a min- and maxflow (one sample at a time). However, I did not put the filename.trim.flow - as flow-file for the shhh.flows. :-/
Now I did it with the minflows (300) and maxflows (670) option and now it should be ready in 2 - 3 days.

Thank you!!

Hmmm… Not sure what was going on. I would strongly discourage you from using Quince’s 360/720 approach. This actually does very little to reduce the error rates. In contrast, if you used 450/450 you will get much better error rates (it should also go much much faster). We discussed this in our PLoS ONE paper.