read.dist() and/or cluster() error when using cutoff

sharpton · February 14, 2011, 10:37pm

Hello,

First, let me thank you for the wonderful contribution you’ve made with this software. It is helpful and easy to use. That said, I recently upgraded to v.1.16.1 and I may have identified a bug in either the read.dist() or cluster() commands.

My understanding is that cluster() should produce the same results (aside from variation due to breaking ties) regardless of whether a cutoff is set by the user. However, when I read a distance matrix into mothur while using the cutoff option (e.g., read.dist(phylip=“in_matrix.phy”, cutoff=0.15) ), cluster() returns a different number of otus for certain cutoffs less than and including the specified cutoff (e.g., 0.15) than if I ran read.dist() without a cutoff (e.g., read.dist(phylip=“in_matrix.phy”). I am using average linkage clustering and have evaluated this by running cluster() with and without a cutoff (e.g., cluster(method=average, cutoff=0.15) and cluster(method=average)) with no difference in the observation.

An example of the difference in output by mothur follows. Here, I have printed the first two columns of the *.an.list files. The first column corresponds to the cutoff threshold and the second column corresponds to the number of OTUs. I have printed the commands I used to generate the output above each set of results.

##With cutoff option##
#Commands

mothur > set.dir(output=…/db/samples/test_16S_reads/otus/BAC/)
Changing output directory to /Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/

mothur > read.dist(phylip=…/db/samples/test_16S_reads/matrix/test_16S_reads_SSU_BAC_FT_pseudo_pruned.phymat, cutoff=0.15)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

It took 0 secs to read

mothur > cluster(method=average, cutoff=0.15)
changed cutoff to 0.0759733

Output File Names:
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.sabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.rabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.list

#Results

unique 140
0.00 70
0.01 61
0.02 56
0.03 48
0.04 42
0.05 38
0.06 34
0.07 32

Note how the maximum clustering threshold provided is 0.07.

##Without cutoff option##
#Commands

mothur > set.dir(output=…/db/samples/test_16S_reads/otus/BAC/)
Changing output directory to /Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/

mothur > read.dist(phylip=…/db/samples/test_16S_reads/matrix/test_16S_reads_SSU_BAC_FT_pseudo_pruned.phymat)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

It took 0 secs to read

mothur > cluster(method=average, cutoff=0.15)

Output File Names:
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.sabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.rabund
/Users/sharpton/projects/OTU/gittest/db/samples/test_16S_reads/otus/BAC/test_16S_reads_SSU_BAC_FT_pseudo_pruned.an.list

#Results

unique 140
0.00 70
0.01 61
0.02 56
0.03 48
0.04 42
0.05 38
0.06 34
0.07 32
0.08 30
0.09 24
0.11 23
0.12 22
0.13 20
0.14 19

Note that results are given for cutoff values up to 0.14 (0.15 has the same distribution of OTUs as 0.14 in this case).

It’s possible that I am misunderstanding the role of the cutoff option in read.dist(), but these observations seemed to counter the description in the manual. If there is any additional information I can provide, please let me know.

Best,
Thomas

pschloss · February 15, 2011, 11:08am

Thanks, Thomas. Yes we know about this and it isn’t exactly a bug. mothur stores the distance matrix by excluding distances above your cutoff. This can cause issues for average neighbor because it averages distances and in some cases may be looking for distances that have been excluded. If this happens, then mothur is smart enough to drop the cutoff to the range where it can see the distances. If you want distances that have been removed, you need to increase the cutoff. Sorry we haven’t gotten to putting up a hand-worked example of this…

Pat

sharpton · February 24, 2011, 6:50pm

Hi Pat,

Thanks for clarifying this point; it makes complete sense and explains the observation that average neighbor results are obtained for previously excluded cutoff values when the read.dist cutoff is increased. I’ll appropriately amend my code to accommodate this feature.

Best,
Thomas

Topic		Replies	Views
Problem with cluster mothur bugs	16	13589	August 6, 2010
problem with cluster command Commands in mothur	4	4740	July 23, 2010
cluster bug? mothur bugs	2	2707	February 7, 2013
cutoff not working correctly in cluster command Commands in mothur	5	5054	March 2, 2012
distance cutoff in the clusters Commands in mothur	2	2984	March 23, 2011

read.dist() and/or cluster() error when using cutoff

Related topics