why .an.unique_list.0.03.cons.taxonomy results have replicates with same classification?


When I check the OTU classification results in “test.trim.contigs.trim.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons”, some of the results were the same, for instance:

Otu0007 294 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu0008 195 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu0009 115 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu0010 256 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);

Are OTU 7,8,9 and 10 belong to the same genus? If so, can I summarize them together? And why the outputs are separated? If not, why they all have the same name?


This is because OTUs are not the same as taxonomy. OTUs are defined by the difference between sequences in your data set, taxonomy is defined by the similarity of those sequences to an external reference database.

I see you used 97% similarity as your OTU definition, which is the common proxy for species-level differentiation. If you think about that, the results you’re showing here are all genus-level classifications. Those OTUs may all be difference species (or strains), but a genus-level classification still groups them the same. More realistically though, they’re simply different metrics for describing the same data. Using what I said above, OTU_7 and OTU_8 could be ~4% different from each other, but when tested against your database still be a closest match to the same sequence in the database.

If you want to get taxonomy-based clustering, the phylotype command builds a shared table off your taxonomy/count data.

that makes lots senses. Thanks much :slight_smile: