Robustness and Reproducibility in the Demarcation of OTUs

Dear mothur community,

I’d like to (again) shamelessly advertise a paper that has just been published online, and which may be of some interest to some of you:

Schmidt TSB, Matias Rodrigues JF, von Mering C (2014). Limits to Robustness and Reproducibility in the Demarcation of Operational Taxonomic Units. Environ Microbiol, doi:10.1111/1462-2920.12610
http://onlinelibrary.wiley.com/doi/10.1111/1462-2920.12610/abstract

We have investigated six different methods for ‘de novo’ OTU demarcation: hierarchical average linkage (AL), complete linkage (CL), single linkage (SL) clustering, as well as CD-HIT, UCLUST and UPARSE. Our basic question was: how comparable (or not) are the results provided by these methods? When analyzing a 16S dataset, how much interpretation bias is introduced by the choice of clustering method alone? How robust / sensitive is clustering to minor parameter changes, such as minute threshold changes, or changes in clustering context? How similar are clusterings based on subregions only when compared to full-length clusterings?
We ran a series of tests to look into these questions. We focused on cluster composition to measure similarity between partitions – do two given methods tend to bin the same sequences together, or not?

Our results indicate that AL and CL, and somewhat surprisingly also CD-HIT provided highly robust and reproducible clustering, whereas SL, UCLUST and UPARSE were more sensitive to even slight parameter changes. For example, simply by moving e.g. from a 97% cutoff to 97.2% led to significantly different clusterings for the latter heuristics.
We observed that clustering for all methods was generally ‘replicable’ – when repeating the exact same clustering run, results were replicated. However, not all methods were also ‘reproducible’ – that is, they did not necessarily provide concordant results under slightly changed computational setups.

For more details, feel free to take a look at the manuscript :wink:

I am posting this here, because we are genuinely interested in getting feedback by you guys – I guess the mothur community is the right ‘target audience’ for this kind of studies. So if you have any questions, please feel free to ask :slight_smile:

Best,


Sebastian

Can’t find your paper. Not with the links provided here, not on Wiley, or Web of Science. Not even with a google search. FYI

Just checked and it worked for me.