combined OTU and taxonomic approach?

Hi -
I have an idea that I wanted to pose to the group. I have noticed in my dataset that sequences clustered into the same OTU can sometimes be assigned to quite distinct phylogenetic groups (such as different genera). Why not first bin sequences by taxonomy and then cluster into OTUs within taxonomic groups?

I’m not sure how dramatic of an effect this would have, but it seems like a reasonable way to take into consideration the different types of information available in different portions of sequence. Portions of sequence carrying phylogenetic information could be used to bin by taxonomy, and then more variable regions could be used to create OTUs.

What do you think?


If you want to speed things up, you can split the dataset up by binning to the phylum level and then cluster everything. As I think you’re suggesting - binning below the “genus” level, there’s the difficulty that not all genera behave nicely. For example the Clostridia aren’t necessarily monophyletic and so some weird things happen. Frankly, I’m currently a fan of doing OTUs and then getting a consensus classification for the OTU. We’re finishing up a command to do the split by taxa and then OTU if you are interested in taking it for a test drive.



