OTUs vs ASVs

I would like to revive this topic, after reading the ISME publication by the DADA2 people (https://www.nature.com/articles/ismej2017119). It is my opinion that this does merit some more conceptual attention on the mothur forum. Although the discussion has been started before on this forum (e.g. OTUs or sequences? on ASVs and oligotyping, OTU classification and minimum entropy decomposition on MED), I feel that there is no consensus on when to use the one or the other.
In the past my experience with ESV’s has been limited to oligotyping to a specific taxon (on relatively abundant OTU representing it) among different conditions and sometimes seeing different oligotypes popping up between conditions (i.e. the use as suggested by @dwaite in OTU classification and minimum entropy decomposition).

Nevertheless, recently I was clustering 350 samples of full-overlap V4 data on a fairly powerfull machine and again getting absurdly large distance matrices (http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/, using the MiSeq SOP with mothur 1.40.5) and painfully realizing that to have comparability among samples, they always need to be clustered together at once. On of the points that Callahan, McMurdie and Holmes make in their ISME perspective is that with ASV’s this is a non-issue as a certain ASV can be infered on a single-sample basis and compared to any amount of other samples (i.e. the grouping is independent on dataset size). I think that it is not entirely fair, because a beta-diversity metric will be inflated if new ASVs occur in new samples (i.e. because of the amount of zeros will increase in the original set as the ASVs do not occur there), just as it would with OTUs.

In all fairness, I have not yet thoroughly evaluated ASV’s myself (of course the DADA2 people claim it is amazing, but everyone believes their tool is amazing :grinning:). I will look at some mock (ZymoBIOMICS community standard) data in the coming weeks with DADA2 and mothur for V3-V4 and hopefully also for full-overlap V4 to get a better idea, but I’d like to have a more conceptual discussion on the value of (97%) OTUs vs ASVs.

For instance: what is the prevailing opinion on ecological consistency/robustness (e.g. https://www.ncbi.nlm.nih.gov/pubmed/24763141 and https://msphere.asm.org/content/2/2/e00073-17) and biological conistency/interpretation of OTUs vs. ASVs (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5812548/ and the discussion between Schloss and Edgar stated above)? For instance: How well does the assumption that “biological sequences are more likely to be repeatedly observed than are error-containing sequences” (Callahan et al. (2017) ISME) hold, e.g. in the case of the divergence in rRNA operons as stated in the comment by Fierer et al.?

I agree that it probably doesn’t make sense to use a marker gene study to distinguish pathogenic species/strains from others in the same genus based on ASV/ESV’s (given the resolution of most single-marker genes and the read length of MiSeq). But I do think that the argument that is being made about interoperability/re-usability of ASV’s is one that deserves a second look.

As a side-note: with the advent of full-length 16S NGS amplicon seq (e.g. on the PacBio platform, https://peerj.com/articles/1869/), with decreasing error rates, would ASV’s become more relevant ?

2 Likes