If I get an OTU which is, for example, “Unclassified gammaproteobacteria”, that means the representative read was unclassified at that level. However, other reads in that OTU are often longer than the representative, and give a better classification. So, it would be great if - for OTUs that are “unclassified” at some level - Mothur can classify ALL the reads in that OTU, perhaps by running the RDP classifier. Then, if some reads managed to classify to genus level, and they are all the same genus, Mothur can re-classify the entire OTU into that genus.
classify.otu takes the taxonomy for each sequence in an OTU and reports the classification for which at least 50% of the sequences agree. It isn’t clear whether you’re trying this already. Alternatively, you could get a representative sequence (get.oturep) and then classify that sequence. The problem with what you’re proposing and the get.oturep approach is that you can get lucky (or unlucky) and find a sequence that is better/worse at being classified than everything else in the OTU. This would give you a false impression of the consensus taxonomy for the entire OTU. The idea behind using the consensus taxonomy is that the process isn’t biased by what you expect the results to be.
I see the logic here, but would you not agree that if, say, 20% of the sequences in an OTU agree on a single genus, and no other sequences “disagree” (i.e. classify to anything else), that we can safely classify the entire OTU?
But 80% of the sequences disagree.
we´ve got through Mothur several fasta files without the chimeras sequences that uchime found for us. What we would like to do now is to classify our sequences into OTUs. In our case, in order to compare with other projects we´ve already finished, we would like to obtain the OTUs with the classification number of RDPs data base (For example OTU 136 for Acidobacteria Gp6). Is it possible to do it using Mothur?. We just have the fasta files, nothing else. I imagine we must get at least the data base.
Than you!