cluster.split cluster=f, then what

I’m trying to cluster a huge distance matrix (130gb, lots of diverse samples, have 277k 3% otus, now want to cluster at 5 and 10%). I’m using a HPC with 250gb RAM but a 3 day wall time limit and that’s not long enough to finish cluster.split. So I was thinking of trying the cluster=f option, clustering each phyla’s distance matrix, then concatenate the results. Theoretically that should work, correct?

The cluster.split command has 3 major parts: split the matrix, cluster each piece, assemble result. You can run cluster.split, cluster=f to split the matrix. Then run cluster on each matrix individually. When you assemble the list files into one file, watch out for the changing cutoff in the average neighbor method. You want to ignore clusters that are at a higher distance than the smallest changed cutoff. The list files may have “missing” distances. This is because mothur only prints the clusters at distances where something has changed, http://www.mothur.org/wiki/Frequently_asked_questions#Why_is_data_missing_for_some_distance_levels.3F. You can use the clusters at the next smallest distance.