I’m a new user of mothur and I’ve been reading about the methodology of OTU generation. Last year a paper came out by Eren et al. that generates OTUs by using minimum entropy decomposition (MED), where information-rich base positions are used to separate a group of sequences into smaller groups iteratively, ultimately ending with small groups/final OTUs. The conventional approach implemented by the mothur uses, initially, binning sequences into a taxonomic level (e.g. Order), then doing de novo clustering for each bin. (Please let me know if I am inaccurate with my statements).
2 questions come to mind:
-
Is MED potentially a “better” approach than the current one? Or is MED good only in the sense that it can tease apart very similar sequences into separate OTUs?
-
How do you deal with multiple copies of 16S genes in one organism? I think I read once that the # of 16S genes within one species can range up to tens of copies, pseudogenes included. Would this result in inaccuracy when inferring organismal abundance from 16S reads?
Ref:
Eren et. al ISME 2015: http://www.nature.com/ismej/journal/v9/n4/full/ismej2014195a.html