Issues with splitting

LisaS · August 18, 2023, 6:13am

Continuing the discussion from Clustering for an low diversity, large dataset:

Hi Pat,

Following on from the issue linked above. We have now managed to get past the first taxon in our split command (after 27 days of wall time on our supercomputer); however, when mothur tried to start running split on the next taxon, it couldn’t find some of the input files. Obviously this is strange since it used those same input files to run split on the first taxon.

I’ve attached screenshots of the output logfile here:

My two questions are:

Any idea what’s happened here?
Is there a way to run this step again (cluster.split with cluster=f) and skip the first taxon? Can I feed the output from the first taxon (the .0.dist file) into this command and carry on from there? It took 27 days for the first taxon to run so I’d prefer to expediate it when we try again!

Thanks in advance,

Lisa.

pschloss · August 21, 2023, 5:11pm

Hi Lisa,

The most likely culprit is still probably the high error rate of the data. Could you try a larger diffs value in pre.cluster? I typically advocate for 1 nt of difference per 100 nt of sequence. This would hopefully collapse a lot more sequences together. Unfortunately, there isn’t a way to skip one taxonomic group

Pat

system · August 31, 2023, 5:11pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cluster.split Commands in mothur	4	1308	May 26, 2017
cluster.split hangs before merging list files mothur bugs	4	2831	November 25, 2015
Clustering for an low diversity, large dataset Commands in mothur	2	249	July 28, 2023
cluster.split issues Commands in mothur	1	1566	May 25, 2015
cluster.split() too slow Commands in mothur	2	827	October 5, 2017

Issues with splitting

Related topics