Classify seqs

Dear Pat,

Please help me out!
I have aligned my seqs with the greengenes database and then classified them with the RDP_09 trainset…is this terribly bad to publish? I must admit that at first I didn’t realize that I had different options for classifiying my sequences…I followed the 454 SOP and thought it was good to use the RDP trainset, but then I read in more detail and you recommend the SILVA not only for the alignment but also for classifying the seqs…

any suggestions?

Dear Astrid,
If I were reviewing a paper involving classifying organisms in a sample based on 16S rRNA sequences, the level of work put into
classifying the organisms would depend quite a bit on the sample type and what was found, and how much importance the classification
has in the conclusions of the paper. For example, if someone is comparing the gut microbiome of grass-fed cattle to grain-fed cattle
and the change is quantitative with relative levels of Echerichia vs Clostridia changing, then the classification is very simple and
any method should work. On the other hand if the sequences come from an exotic location and/or include very strange organisms
with significant diversity, and a major point of the paper is classifying these newly discovered organisms, then I would want to
see one or more phylogenetic trees and perhaps other analysis of the data.
There should not be a “one analysis for all purposes” answer to this question, in my opinion.

Hello Brian,

Thanks for your post, it’s helpful :slight_smile:

As a reviewer I wouldn’t care much about which taxonomy reference one used. I think the names are all pretty artificial anyway. But you said you aligned your sequences to greengenes. This would annoy me as we have shown that the greengenes reference alignment is awful relative to the SILVA reference alignment.

Pat