it’s friday and here we are again with a question.
As we understand, the term of OTU is quite vague. It is not directly linked to a taxonomical measure. But it is open to the investigator’s definition.
When doing the OTU based analysis, as described by the SOP, the resulting OTUs are based on a distance matrix (we did dist.seq and cluster.seq). Thereby representative sequences of different genera can end up in the same OTU. The next step will classify the OTUs on a taxonomical level. Yet we can have different genera in one OTU and go for taxonomic grouping. How is that possible when one OTU can include different genera?
We think we do understand the concept of “OTU”. But we don’t see the advantage of a OTU based data analysis in contrast to the phylotype data analysis.
I think the easiest way to think of it is that OTU is a within-data definition. You’re comparing your sequences with each other and creating clusters of sequences which are most similar to each other.
When doing taxonomy, you’re comparing you’re comparing your sequences to an external reference and seeing what the most similar match is. The trick here is that not all matches are equal.
As a kind of thought experiment, say you have two sequences which were 96% identical to each other. Those would be different OTUs under the SOP (>3% difference), but if you were to classify those against a database they might be assigned to the same genus at different matches - one might be a 99% match and the other a 95% match, but since those are the best results you’ll only see that the two OTUs are called the same genus.
There are all sorts of reasons this can happen, the big ones being that the databases are nowhere near complete (people don’t really generate full-length 16S sequences any more so the databases have stagnated a bit) and when looking at amplicons you’re really only examining a few hundred bases of information. In many circumstances that’s just not enough to reliably distinguish between groups.
Thank you both for the extensive explanation of our question!
Still, we have to make a decision between OTU and phylotype based analysis, e.g. alpha-diversity. As mothur offers both ways they seem to be both eligible. The papers we saw used mostly the OTU based analysis. In our data-set either way results in the same amount of OTUs or genera. Maybe this depends on investigated data-sets?