cutoff not working correctly in cluster command

bansal_raman · January 2, 2012, 2:07am

When I am using cluster command with a set cutoff value, it automatically changes the cutoff value. Here is the example (from log file)

_mothur > cluster(column=adultfinal.dist, name=adult.names, cutoff=0.10)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

changed cutoff to 0.028546

Output File Names:
adultfinal.an.sabund
adultfinal.an.rabund
adultfinal.an.list

It took 29964 seconds to cluster._

I had cutoff value set to 0.10, but it changed to 0.028546.
Any suggestions please?
Thanks

westcott · January 6, 2012, 7:40pm

This is one of our common questions, here’s Pat’s explanation. “This is a product of using the average neighbor algorithm with a sparse distance matrix. When you run cluster, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. Incidentally, although we always see this, it is a bigger problem for people that include sequences that do not fully overlap.” I would suggest increasing your cutoff.

umforb25 · February 29, 2012, 12:38am

I set a cutoff of 0.05, and it changed the cutoff to 0.0482271. Is this a big problem? You say to increase the cutoff, what would you recommend increasing it to?

Thanks!

westcott · February 29, 2012, 10:48am

It is not a problem. The recommendation to increase the cutoff is to resolve the problem of the cutoff dropping below a value you were looking to see, :).

umforb25 · February 29, 2012, 12:58pm

Ok great.

But what does this cutoff actually mean … sorry, relatively new to this.
And does it cause a problem if I’m comparing this dataset with a cutoff of 0.4 to a different dataset that has a cutoff of 0.5?

Thanks!

westcott · March 2, 2012, 1:30pm

The cutoff is used to boost speed and save memory. It does this by “ignoring” distances above the cutoff. For example, if you know that you are only interested in OTUs formed at a distance below 0.10, why keep a distance that is greater than 0.10? Because of average neighbor clustering method you might be interested in saving the distances smaller than 0.25, but you don’t need all the distances. What do you mean by comparing the two datasets?

Topic		Replies	Views
setting the clustering cutoff mothur bugs	1	2530	November 29, 2011
problem with cluster command Commands in mothur	4	4733	July 23, 2010
How to change the cutoff for cluster Commands in mothur	3	2201	January 17, 2014
"changed cutoff to 0.0602102" in"cluster" command Commands in mothur	1	1606	December 19, 2014
unexpected results using the the cluster command Commands in mothur	1	2116	June 6, 2013

cutoff not working correctly in cluster command

Related topics