I’ve got an OTU table showing a classification of each OTU at every taxonomic level (from Phylum to Genus). I would like to know the similarity % that Mothur uses to classify the OTUs within each taxonomic level (e.g., 97% at genus level).
Many thanks in advance.
The percentage thresholds are pretty meaningless and good reviewers will slap your hand if you try to claim something. classify.seqs uses the Bayesian approach described here…
mothur > classify.seqs(citation)
Wang Q, Garrity GM, Tiedje JM, Cole JR (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-7. [ for Bayesian classifier ]
I read the paper you mentioned in your last reply. But I still have doubts about the following point. Consider an OTU table based on SILVA database. This table shows a classification of the OTUs from Phylum level to Genus level. The clustering has been made at 97% similarity, with an average sequence length of 406 bases. Can we then say that every individual OTU is representing a given genus? I’m pretty sure this is the case, because the OTU table gives you the actual name of the genus at last level of classification reached.
I’d be very grateful if you could finally resolve this issue. I’m looking forward to your reply.
OTU is operational because we don’t know what a bacterial species is. Many have agreed that 3% OTU roughly corresponds to a bacterial species (very roughly in some cases). If you taxonomical ID your OTU representative sequence, the level that is ID’d is completely dependent on the database that you are using for classification. The Silva people have decided that 16S really only gives reliable genus identification across all bacteria so have only included identification to the genus level (I think this conservative approach is good).
Try thinking of taxonomic classification of your OTUs as similar to significant figures in calculations. Mathematically you may be able to calculate a value to the millionths position but if you used a measurement that’s only precise to the tenths, your calculated value should only be discussed to the tenths. You may find a database that identifies reference sequences to the substrain, but if you are matching 400bp of 16s you absolutely shouldn’t report that substrain name because you can’t have confidence in that fine of an identification with just partial 16s.
Many thanks for your reply.
I understand that the classification method by SILVA is just a convention, and that the similarity % referred to each taxonomic level depends on the convention you use. But considering that I’ve used SILVA with 97% similarity, I want to make sure that I can assign a genus name (from SILVA) to every OTU (obviously successfully classified at genus level). I know that this does not necessarily mean that they are actual bacterial genuses (because we’d have to agree on how a genus is defined), but we should be able to talk about bacterial genuses under SILVA taxonomy frame.
What do you think?
I think I agree, mostly. I just don’t care very much about the names attached to an OTU