Robustness and Reproducibility in the Demarcation of OTUs

Sebastian · September 1, 2014, 5:42pm

Dear mothur community,

I’d like to (again) shamelessly advertise a paper that has just been published online, and which may be of some interest to some of you:

Schmidt TSB, Matias Rodrigues JF, von Mering C (2014). Limits to Robustness and Reproducibility in the Demarcation of Operational Taxonomic Units. Environ Microbiol, doi:10.1111/1462-2920.12610
http://onlinelibrary.wiley.com/doi/10.1111/1462-2920.12610/abstract

ncbi.nlm.nih.gov

Limits to robustness and reproducibility in the demarcation of operational taxonomic units.

TS Schmidt, JF Matias Rodrigues and C von Mering, Environmental microbiology, May 2015

The demarcation of operational taxonomic units (OTUs) from complex sequence data sets is a key step in contemporary studies of microbial ecology. However, as biologically motivated 'optimal' OTU-binning algorithms remain elusive, many conceptually distinct approaches continue to be used. Using a global data set of 887 870 bacterial 16S rRNA gene sequences, we objectively quantified biases introduced by several widely employed sequence clustering algorithms. We found that OTU-binning methods often provided surprisingly non-equivalent partitions of identical data sets, notably when clustering to the same nominal similarity thresholds; and we quantified the resulting impact on ecological data description for a well-defined human skin microbiome data set. We observed that some methods were very robust to varying clustering thresholds, while others were found to be highly susceptible even to slight threshold variations. Moreover, we comprehensively quantified the impact of the choice of 16S rRNA gene subregion, as well as of data set scope and context on algorithm performance. Our findings may contribute to an enhanced comparability of results across sequence-processing pipelines, and we arrive at recommendations towards higher levels of standardization in established workflows.

We have investigated six different methods for ‘de novo’ OTU demarcation: hierarchical average linkage (AL), complete linkage (CL), single linkage (SL) clustering, as well as CD-HIT, UCLUST and UPARSE. Our basic question was: how comparable (or not) are the results provided by these methods? When analyzing a 16S dataset, how much interpretation bias is introduced by the choice of clustering method alone? How robust / sensitive is clustering to minor parameter changes, such as minute threshold changes, or changes in clustering context? How similar are clusterings based on subregions only when compared to full-length clusterings?
We ran a series of tests to look into these questions. We focused on cluster composition to measure similarity between partitions â€“Â do two given methods tend to bin the same sequences together, or not?

Our results indicate that AL and CL, and somewhat surprisingly also CD-HIT provided highly robust and reproducible clustering, whereas SL, UCLUST and UPARSE were more sensitive to even slight parameter changes. For example, simply by moving e.g. from a 97% cutoff to 97.2% led to significantly different clusterings for the latter heuristics.
We observed that clustering for all methods was generally ‘replicable’ â€“ when repeating the exact same clustering run, results were replicated. However, not all methods were also ‘reproducible’ â€“Â that is, they did not necessarily provide concordant results under slightly changed computational setups.

For more details, feel free to take a look at the manuscript

I am posting this here, because we are genuinely interested in getting feedback by you guys â€“ I guess the mothur community is the right ‘target audience’ for this kind of studies. So if you have any questions, please feel free to ask

Best,

Sebastian

patientwind · February 11, 2015, 12:02am

Can’t find your paper. Not with the links provided here, not on Wiley, or Web of Science. Not even with a google search. FYI

pschloss · February 11, 2015, 2:14pm

Just checked and it worked for me.

Topic		Replies	Views
Cluster.split Commands in mothur	1	1925	December 20, 2014
OTUs & Ecological Consistency Journal club	2	7068	May 2, 2014
My Own Journal club	1	4411	March 30, 2015
Create OTU table comparable to QIIME Commands in mothur	4	1415	March 30, 2019
Processing samples in subsets Theory behind mothur	7	3665	August 12, 2015

Robustness and Reproducibility in the Demarcation of OTUs

Related topics