Dear mothur community,
I’d like to (again) shamelessly advertise a paper that has just been published online, and which may be of some interest to some of you:
Schmidt TSB, Matias Rodrigues JF, von Mering C (2014). Limits to Robustness and Reproducibility in the Demarcation of Operational Taxonomic Units. Environ Microbiol, doi:10.1111/1462-2920.12610
We have investigated six different methods for ‘de novo’ OTU demarcation: hierarchical average linkage (AL), complete linkage (CL), single linkage (SL) clustering, as well as CD-HIT, UCLUST and UPARSE. Our basic question was: how comparable (or not) are the results provided by these methods? When analyzing a 16S dataset, how much interpretation bias is introduced by the choice of clustering method alone? How robust / sensitive is clustering to minor parameter changes, such as minute threshold changes, or changes in clustering context? How similar are clusterings based on subregions only when compared to full-length clusterings?
We ran a series of tests to look into these questions. We focused on cluster composition to measure similarity between partitions â€“Â do two given methods tend to bin the same sequences together, or not?
Our results indicate that AL and CL, and somewhat surprisingly also CD-HIT provided highly robust and reproducible clustering, whereas SL, UCLUST and UPARSE were more sensitive to even slight parameter changes. For example, simply by moving e.g. from a 97% cutoff to 97.2% led to significantly different clusterings for the latter heuristics.
We observed that clustering for all methods was generally ‘replicable’ â€“ when repeating the exact same clustering run, results were replicated. However, not all methods were also ‘reproducible’ â€“Â that is, they did not necessarily provide concordant results under slightly changed computational setups.
For more details, feel free to take a look at the manuscript
I am posting this here, because we are genuinely interested in getting feedback by you guys â€“ I guess the mothur community is the right ‘target audience’ for this kind of studies. So if you have any questions, please feel free to ask