Confused about cluster.split cutoff parameter

michberr · March 4, 2015, 4:28pm

Hi Pat,

I’m confused about the cutoff parameter in the cluster.split function. I understand that you set the taxlevel parameter to prebin sequences using the taxonomy information and then cluster within that taxlevel to save memory. What exactly does the cutoff parameter do and why is it set to .15 in the MiSeq SOP? Is this just another measure to reduce memory usage?

Specifically, I am interested in generating OTUs with a 99% cutoff. To do this, do I simply have to run make.shared and classify.otu using a .01 label?

Thanks,
Michelle

westcott · March 5, 2015, 2:54pm

The cutoff parameter is used to reduce the size of the distance matrices generated after the split. Any distances above the cutoff are ignored, creating a sparse distance matrix. When you run the clustering on a sparse distance matrix, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. For a worked example, http://www.mothur.org/w/images/7/7c/AverageNeighborCutoffChange.pdf.

lizbent · March 14, 2015, 5:57pm

Hi, so I ran cluster.split following the SOP, except with a cutoff of 0.03, because I have a small computer and I didn’t want to waste processing power calculating OTUs for distances I didn’t need.

Sometimes I got this message: “Cutoff was 0.035 changed cutoff to 0.03” (or to 0.02). Does this mean that my OTUs for some of my phyla are clustered at 98% similarity, not 97% (for 0.02), and is the 0.035 vs 0.03 message simply a result of the way numbers are handled? And, most importantly, can I say that my OTUs (when I run the make.shared command next) will be clustered at 97%? I’d like to understand if I’m handling the data incorrectly.

Thanks so much for your help,

Liz

Update: I tried the command again with cutoff = 0.15, just in case 0.03 was introducing errors, and got this message in the logfile associated with the command:

Clustering Weese2.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist
Cutoff was 0.155 changed cutoff to 0.03
Cutoff was 0.155 changed cutoff to 0.01
It took 4842 seconds to cluster
Merging the clustered files…
It took 4 seconds to merge.

Output File Names:
Weese2.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list

And there is no data for 0.02 or for 0.03 distances in the list file. What has happened? Do the results of clustering change depending on random variables?

Topic		Replies	Views
cutoff not working correctly in cluster command Commands in mothur	5	5054	March 2, 2012
Clusters.split Changes Cut-off To Unlikely Similarity Thresh mothur bugs	8	7523	February 3, 2014
cluster: cutoff changed lower than demanded Commands in mothur	6	3406	June 22, 2015
change in clustering cutoff Commands in mothur	3	2724	September 11, 2013
setting the clustering cutoff mothur bugs	1	2532	November 29, 2011

Confused about cluster.split cutoff parameter

Related topics