Hello,
First, let me thank you for the wonderful contribution you’ve made with this software. It is helpful and easy to use. That said, I recently upgraded to v.1.16.1 and I may have identified a bug in either the read.dist() or cluster() commands.
My understanding is that cluster() should produce the same results (aside from variation due to breaking ties) regardless of whether a cutoff is set by the user. However, when I read a distance matrix into mothur while using the cutoff option (e.g., read.dist(phylip=“in_matrix.phy”, cutoff=0.15) ), cluster() returns a different number of otus for certain cutoffs less than and including the specified cutoff (e.g., 0.15) than if I ran read.dist() without a cutoff (e.g., read.dist(phylip=“in_matrix.phy”). I am using average linkage clustering and have evaluated this by running cluster() with and without a cutoff (e.g., cluster(method=average, cutoff=0.15) and cluster(method=average)) with no difference in the observation.
An example of the difference in output by mothur follows. Here, I have printed the first two columns of the *.an.list files. The first column corresponds to the cutoff threshold and the second column corresponds to the number of OTUs. I have printed the commands I used to generate the output above each set of results.
##With cutoff option##
#Commands
mothur > set.dir(output=…/db/samples/test_16S_reads/otus/BAC/)
Changing output directory to /Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/
mothur > read.dist(phylip=…/db/samples/test_16S_reads/matrix/test_16S_reads_SSU_BAC_FT_pseudo_pruned.phymat, cutoff=0.15)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||
It took 0 secs to read
mothur > cluster(method=average, cutoff=0.15)
changed cutoff to 0.0759733
Output File Names:
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.sabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.rabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.list
#Results
unique 140
0.00 70
0.01 61
0.02 56
0.03 48
0.04 42
0.05 38
0.06 34
0.07 32
Note how the maximum clustering threshold provided is 0.07.
##Without cutoff option##
#Commands
mothur > set.dir(output=…/db/samples/test_16S_reads/otus/BAC/)
Changing output directory to /Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/
mothur > read.dist(phylip=…/db/samples/test_16S_reads/matrix/test_16S_reads_SSU_BAC_FT_pseudo_pruned.phymat)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||
It took 0 secs to read
mothur > cluster(method=average, cutoff=0.15)
Output File Names:
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.sabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.rabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.list
#Results
unique 140
0.00 70
0.01 61
0.02 56
0.03 48
0.04 42
0.05 38
0.06 34
0.07 32
0.08 30
0.09 24
0.11 23
0.12 22
0.13 20
0.14 19
Note that results are given for cutoff values up to 0.14 (0.15 has the same distribution of OTUs as 0.14 in this case).
It’s possible that I am misunderstanding the role of the cutoff option in read.dist(), but these observations seemed to counter the description in the manual. If there is any additional information I can provide, please let me know.
Best,
Thomas