A problem in clustering

When I clustered sequences into OTUs at a distance cutoff of 0.03 with average-neighbor method and obtained representative sequences from each OTU with the command “get.oturep”,

I frequently encountered the case that the similarities between the representative sequences were above 97%.

I am afraid that this indicates that the number of OTUs was overestimated.

How can I reduce this phenomenon?

Thank you.


There is no perfect method for clustering sequences. If you look at our 2011 AEM paper you’ll see that of all the methods, average neighbor is the best.

Thank you for your kind reply. I understand.