Issues with splitting

Continuing the discussion from Clustering for an low diversity, large dataset:

Hi Pat,

Following on from the issue linked above. We have now managed to get past the first taxon in our split command (after 27 days of wall time on our supercomputer); however, when mothur tried to start running split on the next taxon, it couldn’t find some of the input files. Obviously this is strange since it used those same input files to run split on the first taxon.

I’ve attached screenshots of the output logfile here:




My two questions are:

  1. Any idea what’s happened here?
  2. Is there a way to run this step again (cluster.split with cluster=f) and skip the first taxon? Can I feed the output from the first taxon (the .0.dist file) into this command and carry on from there? It took 27 days for the first taxon to run so I’d prefer to expediate it when we try again!

Thanks in advance,

Lisa.

Hi Lisa,

The most likely culprit is still probably the high error rate of the data. Could you try a larger diffs value in pre.cluster? I typically advocate for 1 nt of difference per 100 nt of sequence. This would hopefully collapse a lot more sequences together. Unfortunately, there isn’t a way to skip one taxonomic group

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.