hard cluster cutoff [v1.19.4]

Hi all,
There seems to be a problem with the hard=t (or hard=T) option in the cluster command.
Mothur [v1.19.4] seems to ignore this option and changes the cutoff to an “arbitrary number”.
See below:

mothur > cluster(name=1.TCA.454Reads.final.names, cutoff=0.03, hard=t)
Using 1.TCA.454Reads.final.dist as input file for the phylip parameter.
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

changed cutoff to 0.0180974


The changed cutoff has nothing to do with the hard cutoff option. That is a “feature” of using average neighbor with a cutoff. Because the algorithm tries to merge rows/columns in the distance matrix where one of the rows/columns may have a distance above the cutoff, it becomes necessary to drop the cutoff so that both rows and columns are below the cutoff. I would suggest setting the cutoff at dist.seqs - cutoff=0.15, say. Then let cluster take care of things with the cutoff. The cutoff in 0.03 in the cluster command applies the cutoff at the read in step. hard=t by default.

Hope this helps,

Hi Pat,
Thanks for your reply!
I see why the clustering method makes a difference here.

However, even when I set a cutoff at dist.seqs
[dist.seqs(fasta=1.TCA.454Reads.trim.chop.unique.good.filter.pick.filter.unique.precluster.fasta, output=lt, cutoff=0.05, processors=12)]
Mothur will cluster all the way down to a distance of 0.58.
When I look at the “.dist” file it seems that setting the dist.seqs cutoff had no effect.


When output=lt or output=square, mothur ignores the cutoff. This is because a the phylip format requires each sequence to have a distance to every other sequence.

This makes sense.