cluster.split cluster=f, then what

Kendra · February 1, 2013, 10:51pm

I’m trying to cluster a huge distance matrix (130gb, lots of diverse samples, have 277k 3% otus, now want to cluster at 5 and 10%). I’m using a HPC with 250gb RAM but a 3 day wall time limit and that’s not long enough to finish cluster.split. So I was thinking of trying the cluster=f option, clustering each phyla’s distance matrix, then concatenate the results. Theoretically that should work, correct?

westcott · February 5, 2013, 12:53pm

The cluster.split command has 3 major parts: split the matrix, cluster each piece, assemble result. You can run cluster.split, cluster=f to split the matrix. Then run cluster on each matrix individually. When you assemble the list files into one file, watch out for the changing cutoff in the average neighbor method. You want to ignore clusters that are at a higher distance than the smallest changed cutoff. The list files may have “missing” distances. This is because mothur only prints the clusters at distances where something has changed, http://www.mothur.org/wiki/Frequently_asked_questions#Why_is_data_missing_for_some_distance_levels.3F. You can use the clusters at the next smallest distance.

Topic		Replies	Views
problem with cluster.split...? mothur bugs	2	3075	December 29, 2014
Issues with cluster command Commands in mothur	5	4453	December 19, 2012
Cluster.split and computer characteristics	7	1849	October 23, 2019
I split the distance file first. How to cluster now? Feature requests	3	3435	August 26, 2013
Average clustering Commands in mothur	1	2433	April 26, 2011

cluster.split cluster=f, then what

Related topics