setting the clustering cutoff

Hello. When using the cluster command, mothur is changing the cutoff value to suit her needs. For example, I implemented dist.seqs with cutoff=0.10, but when I ran cluster mothur changed the cutoff to ~0.05.

mothur > cluster(,
Reading matrix:     |||||||||||||||||||||||||||||||||||||||||||||||||||
changed cutoff to 0.0475441

Output File Names:

It took 1050 seconds to cluster

Why does mothur do this, or how can I prevent mothur from doing this? Thanks a lot!

This is one of our common questions, here’s Pat’s explanation, “This is a product of using the average neighbor algorithm with a sparse distance matrix. When you run cluster, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. Incidentally, although we always see this, it is a bigger problem for people that include sequences that do not fully overlap.” I would recommend increasing the cutoff to 0.25.