cluster cutoff and creating OTUs at multiple sequence similarity thresholds

I don’t completely understand why the cutoff changes in the cluster command after reading the FAQ ( … eighbor.3F).

Can you explain: “If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data” in more detail? Why do you “lose data” when you start clustering at higher cutoff levels?

I am trying to create OTUs at multiple different sequence dissimilarity thresholds, from 0.00 to 0.25 at 0.01 (1%) increments. When I try and set the cutoff to 0.25 using the cluster command, it always changes the cutoff to 0.14. Is it possible to do what I am trying to do here (create OTUs with dissimilarity thresholds up to 30%)? If so, how should I go about doing this?


I’d encourage you to do the clustering by hand for a small dataset with and without using the distance above the threshold.

I’m also not sure why you’d want OTUs at a level of 0.30. If you want phylum or order-level names or whatever, you’d be best served by using classify.seqs and use the phylotype approach. If you need the distances for some reason, then you’d be best off using cluster.classic with a phylum-formatted distance matrix and not using a cutoff.