OTUs & Ecological Consistency

Sebastian · April 25, 2014, 12:38pm

Dear mothur community,

I’d like to shamelessly advertise a paper we just published that may be interesting to some of you (for the others, sorry for the SPAM ):

Schmidt TSB, Matias Rodrigues JF, Mering von C (2014) Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLOS Computational biology 10: e1003594EPâ€“. doi:10.1371/journal.pcbi.1003594.

We investigated the question whether OTUs are “ecologically meaningful”, in the sense that they cluster sequences of similar ecological affiliation. In fact, in their recent 2013 paper, Koeppel & Wu had doubted that OTUs make sense ecologically:

Koeppel AF, Wu M (2013) Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units. Nucleic Acids Research 41: 5175â€“5188. doi:10.1093/nar/gkt241.

ncbi.nlm.nih.gov

Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units.

AF Koeppel and M Wu, Nucleic acids research, May 2013 01

The lack of a consensus bacterial species concept greatly hampers our ability to understand and organize bacterial diversity. Operational taxonomic units (OTUs), which are clustered on the basis of DNA sequence identity alone, are the most commonly used microbial diversity unit. Although it is understood that OTUs can be phylogenetically incoherent, the degree and the extent of the phylogenetic inconsistency have not been explicitly studied. Here, we tested the phylogenetic signal of OTUs in a broad range of bacterial genera from various phyla. Strikingly, we found that very few OTUs were monophyletic, and many showed evidence of multiple independent origins. Using previously established bacterial habitats as benchmarks, we showed that OTUs frequently spanned multiple ecological habitats. We demonstrated that ecological heterogeneity within OTUs is caused by their phylogenetic inconsistency, and not merely due to 'lumping' of taxa resulting from using relaxed identity cut-offs. We argue that ecotypes, as described by the Stable Ecotype Model, are phylogenetically and ecologically more consistent than OTUs and therefore could serve as an alternative unit for bacterial diversity studies. In addition, we introduce QuickES, a new wrapper program for the Ecotype Simulation algorithm, which is capable of demarcating ecotypes in data sets with tens of thousands of sequences.

They used very fine-scale ecological descriptions and very small datasets and found that their proposed algorithm to simulate “ecotypes” performed better than traditional OTU clustering. However, ecotype simulation is highly parametric, assumes that the Stable Ecotype model of bacterial speciation is true for all microbial taxa, and the software has throughput problems even with medium-sized sequence datasets. We took a different approach to assessing OTU ecological consistency for a global, highly diverse dataset, and for ecological descriptions at different resolutions. We found that OTUs were generally, but not perfectly consistent - in other words, they’re probably “good enough” for most practical purposes.

Moreover, we compared ecological consistency for several different methods (average, complete and single linkage hierarchical clustering, plus cd-hit and uclust). The ecological signals we used can be interpreted as a (sequence- and taxonomy-independent, external) measure of clustering “quality”. It’s hard to define any meaningful benchmark for the “goodness” of OTUs - Patrick Schloss has done some important work in that direction in the past. We think that “ecological consistency” is a useful way of looking at “OTU quality” that can complement such earlier studies.
Somewhat to our surprise, we found that complete linkage (rather than average linkage) provided the most “ecologically consistent” OTUs. We are currently following up on this, and in several other tests we have seen that instead average linkage may perform best. In other words: our study corroborates the idea that (hierarchical) complete linkage clustering is a good choice if you want “ecologically consistent” OTUs.

As said above, this is one the one hand to shamelessly advertise the paper ;). On the other hand, I thought I would put it out here as this seems to be a great place to get some (critical) feedback on the work, and some discussion with people who demarcate OTUs from real-life datasets every day

Best,

Sebastian

vingomez · April 29, 2014, 3:09pm

Hi Sebastian,

Glad to see you and others in the microbial community involve in solving this and many ecological questions.

You briefly acknowledged the importance to test this idea on data generated from partial 16S sequences (regions: v4 or v1-v3, etc). As you may aware the majority of the work we (i.e. the microbial ecologist community) produced and published generated this type of data (for now).

Do you performed some preliminary analysis or have an idea is this will be consistence (use of complete linkage) with the results you obtained with full length sequences?

Thanks
Vicente

Sebastian · May 2, 2014, 1:12pm

Hi Vicente,

thanks for your feedback!

The short answer: we didn’t run the same tests on shortread data, but we did some other tests.

We used a dataset of full-length sequences for several reasons:

(i) Every 16S subregion, or set of subregions, behaves (slightly) differently in OTU demarcation. This has been shown quite nicely e.g. by Patrick Schloss (PLOS Comp Biol, 2010) and also by Kim et al (Journal of Microbiol Methods, 2011). We wanted to design a broadly applicable test set that would not only be “true” for a subset of targeted subregions.

(ii) Depending on sequencing platform and pre-filtering, published shortread datasets can have highly divergent sequence length and sequence quality. We restricted our study to (near) full-length sequences and applied rather strict quality criteria, in order to obtain a consistently high-quality test set.

(iii) The dataset we used resembles in scope and pre-processing very much the reference sets provided e.g. by RDP, Greengenes and SILVA. These are used in many different contexts, e.g. also for â€˜reference-based OTU pickingâ€™.

(iv) We wanted to use a dataset of very broad ecological scope. We could have composed our own test dataset from available shortread datasets, but that would have been (even more) biased towards individual environments. Instead, we grabbed all data for (high-quality, full-length) sequences available via GenBank / RefSeq.

(v) Iâ€™m not expert on sequencing technology, but from what I hear the next generation of platforms (or is it next-next-gen in the meantime, or next-next-next-gen?!) will achieve read lengths of â‰¥1,000 bp. Hopefully, full-length 16S sequencing at very high throughput will soon be possible.

Nevertheless, we had repeated a very similar analysis as detailed in the paper on shortread sequences (V23, V35 and V6), extracted from the full-length dataset. Results were consistent.

Moreover, I have in the meantime looked into a couple of other factors regarding shortread clustering. The results are unpublished, but they point to hierarchical complete and average linkage clustering as being very reasonable choices.

Best,

Sebastian

Topic		Replies	Views
Robustness and Reproducibility in the Demarcation of OTUs Journal club	2	5927	February 11, 2015
Cluster.split Commands in mothur	1	1925	December 20, 2014
My Own Journal club	1	4411	March 30, 2015
Many OTUs classify into the same species?? Commands in mothur	2	1864	February 9, 2016
OTUs vs ASVs Theory behind mothur	6	10215	December 5, 2019

OTUs & Ecological Consistency

Related topics