Dear mothur community,
I’d like to shamelessly advertise a paper we just published that may be interesting to some of you (for the others, sorry for the SPAM ):
Schmidt TSB, Matias Rodrigues JF, Mering von C (2014) Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLOS Computational biology 10: e1003594EP–. doi:10.1371/journal.pcbi.1003594.
We investigated the question whether OTUs are “ecologically meaningful”, in the sense that they cluster sequences of similar ecological affiliation. In fact, in their recent 2013 paper, Koeppel & Wu had doubted that OTUs make sense ecologically:
Koeppel AF, Wu M (2013) Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units. Nucleic Acids Research 41: 5175–5188. doi:10.1093/nar/gkt241.
They used very fine-scale ecological descriptions and very small datasets and found that their proposed algorithm to simulate “ecotypes” performed better than traditional OTU clustering. However, ecotype simulation is highly parametric, assumes that the Stable Ecotype model of bacterial speciation is true for all microbial taxa, and the software has throughput problems even with medium-sized sequence datasets. We took a different approach to assessing OTU ecological consistency for a global, highly diverse dataset, and for ecological descriptions at different resolutions. We found that OTUs were generally, but not perfectly consistent - in other words, they’re probably “good enough” for most practical purposes.
Moreover, we compared ecological consistency for several different methods (average, complete and single linkage hierarchical clustering, plus cd-hit and uclust). The ecological signals we used can be interpreted as a (sequence- and taxonomy-independent, external) measure of clustering “quality”. It’s hard to define any meaningful benchmark for the “goodness” of OTUs - Patrick Schloss has done some important work in that direction in the past. We think that “ecological consistency” is a useful way of looking at “OTU quality” that can complement such earlier studies.
Somewhat to our surprise, we found that complete linkage (rather than average linkage) provided the most “ecologically consistent” OTUs. We are currently following up on this, and in several other tests we have seen that instead average linkage may perform best. In other words: our study corroborates the idea that (hierarchical) complete linkage clustering is a good choice if you want “ecologically consistent” OTUs.
As said above, this is one the one hand to shamelessly advertise the paper ;). On the other hand, I thought I would put it out here as this seems to be a great place to get some (critical) feedback on the work, and some discussion with people who demarcate OTUs from real-life datasets every day
Best,
Sebastian