Suggested new step for the Schloss SOP

roey.angel · March 30, 2012, 12:28pm

Hi,
I noticed that even after all the denoising-cleaning-trimming-chopping-chimera-slayring-younameit there are still some bad (really bad) sequences that managed to slip through all the filters.
These aren’t a lot, maybe a couple of dozens for half a plate of 454, but still they can cause biases downstream particularly with phylometric indices because these bad sequences make really long branches in the trees.
Here’s how I get rid of them:
Right after generating the ‘final.xxx’ files at the end of ‘Reducing sequencing error’

Generate an ML approximation tree using FastTree (this takes only a few min. http://microbesonline.org/fasttree/)

system(FastTree -gtr -nt < final.fasta > final.ml.tre)

Create an arb db with final.fasta and import the tree you just generated
Switch to radial tree view; the bad sequences, if there are any, will stick out as really long branches
Mark those long branches and export them as fasta (you might also want to export their acc numbers as .nds)
I call these suspected.seqs.fasta and suspected.seqs.nds
Blast those sequences. I call a remote blast using a locally installed Blast+

system(blastn -task blastn -remote -db nr -query suspected.seqs.fasta -evalue 0.00001 -dust no -max_target_seqs 1 -html -out output.blastn.html)

Manually inspect the blast results for obvious chimeras and other bad sequences (really low match, very bad alignments etc.)
List of their acc numbers (or more correctly retain them in the suspected.seqs.nds and erase all the good ones)
Run:

remove.seqs(accnos=validated.bad.seqs.nds, fasta=final.fasta)
remove.seqs(accnos=validated.bad.seqs.nds, name=final.names)
remove.seqs(accnos=validated.bad.seqs.nds, group=final.groups)

Done!

Roey

pschloss · April 3, 2012, 1:58pm

Roey, what are the sequences? I suspect they have low bootstrap support to “Bacteria” or are mitochondria/chloroplasts.

Pat

roey.angel · April 4, 2012, 10:37am

Hi Pat,
Some were obvious chimera, other were just bad quality sequences (bad enough to show low quality alignment in blast, mostly due to homopolymers).
Hard to generalize since these were really just a few (I had 35 sequences out of over 120,000).
These prob. don’t affect too many downstream calculations; my only fear was from a serious bias in phylometric calculations (not tested though).

Roey

Topic		Replies	Views
is it important to remove chimeras Commands in mothur	3	2743	February 27, 2013
"error" on "remove. seq." command	4	607	September 9, 2019
chimera.slayer error Commands in mothur	8	8174	August 13, 2010
0 sequences removed after remove.seqs mothur bugs	2	845	August 30, 2018
Produce a chimera clean file? Feature requests	2	4359	April 21, 2010

Suggested new step for the Schloss SOP

Related Topics