Reasons for using Silva as reference for classify.seqs

yangfa1 · March 15, 2016, 1:45pm

I decide to use Silva taxonomy for classify.seqs step. So, I made unaligned fasta file from silva.nr_v123.align simply by removing ‘.’ and ‘-‘. We do not need aligned fasta file for classify.seqs, right?

Could you please comment on my reasons below?

In the whole data process, now I use only one taxonomy (Silva) instead of two (Silva and PDS). I guess this may the results more self-consistent.
More important, there are more sequences in Silva (172418) than in PDS (10801). I am afraid that using PDS may results more reads being wrongly classified as ‘unknown’ than using Silva. I am debating whether I should throw away reads classified as ‘unknown’. Could you please comment on this?
Thanks a lot.

dwaite · March 15, 2016, 8:17pm

Hey,

With regards to alignment - the aligner ignores gaps in the sequences so there’s no harm in removing the alignment gaps before classifying. The only reason you may want those is if you want to filter your reference database to only account for the region of your amplicon. There was a paper published a few years ago that showed doing this improves accuracy of assignment and reduces the amount of unclassified sequences in your data set.

For SILVA vs RDP (vs Greengenes) there’s no right answer as to which taxonomy to use. Ultimately they both pull data from the same sources but they use different approaches to how they assign taxonomy and what they include in their database. The rule of thumb I use is that SILVA is more open to candidate phyla than RDP. Last time I checked there were about 80 phyla/candidate phyla in the SILVA taxonomy and about 40 in RDP.

My view is you should definitely throw away anything that can’t classify to the domain/phylum level.

Topic		Replies	Views
How to use Silva 128 for classification? Commands in mothur	1	908	April 23, 2017
Classify seqs Theory behind mothur	3	5169	September 10, 2014
Using silva as reference database in MiSeq SOP	6	2395	December 8, 2020
Using both Silva and Greengenes in the analysis? Theory behind mothur	1	1585	November 1, 2016
Which reference to use for classify.seqs? Commands in mothur	2	1356	March 14, 2016

Reasons for using Silva as reference for classify.seqs

Related topics