Stuck at cluster.split

Rjacob · February 5, 2024, 8:54pm

I have 25 cultured soil samples sequenced at the V3-V4 region that I am trying to get to get through the cluster splitting. I am also very new to any type of sequencing analysis any specific help that can be given will be much apprecieated.
I have read why V3-V4 is not a good idea and the Why do I have such a large distance matrix article and have some specific questions about how I can make the most out of the situation I’m in.

Can I get this data through cluster.split and how?
I was able to make the dist file with a cutoff at 0.1, and the resulting file is 125gb. Is there any way that I can get this through cluster.split? And if so what is the best way to do that? I have access to a supercomputing institute with many different partitions to choose from with some of the highest powered being a GPU computer with 128 cores, 1000gb, and a job time limit of 24 hours - or a cpu computer with with 128 cores, 2000gb, and a job time limit of 96 hours. If it’s possible to get this data through cluster.split what would be the computing parameters (how many cores, ram, etc) as well as the specific commands/parameters in cluster split (opticlust vs average, cutoffs, taxlevel, etc.) that would give me the highest liklihood of success?
Would it be better to just use the phylotype based approach?
I have read this this may be more feasible with the circumstance I’m in. If this approach is better, then what would be the best way to implement it? I understand that I need to redo the classify.seqs command before this, but will this approach affect how I excecute the downstream commands such as normalization, alpha and beta diversity, etc?

pschloss · February 9, 2024, 1:53pm

Hi there,

It’s most likely going to be the best to use the phylotype-based approach.

If you want to try using cluster.split then I would follow the instructions in the MiSeq SOP. You’re building the distance matrix and then splitting it. In the SOP, we split and then on each split the distance matrices and clustering are done. You should also be using a cutoff of 0.03.

mothur doesn’t use GPUs. You likely want the “cpu computer” although the 96 hours might be limiting.

Pat

system · February 19, 2024, 1:53pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use cluster.split on MiSeq data Commands in mothur	15	13898	May 9, 2013
Cluster.split issue "Num_Dists_Below_Cutoff" Commands in mothur	4	1160	March 14, 2019
cluster.split Commands in mothur	13	8687	July 15, 2013
Issues with cluster command Commands in mothur	5	4453	December 19, 2012
Cluster.split runtime problem Commands in mothur	6	3876	November 24, 2015

Stuck at cluster.split

Related topics