Unable to complete cluster.split due to unspecified error

I have attempted to run my sequences on a remote HPC, but while running cluster.split, it exits without any specific error messages. This happens consistently over multiple attempts. Is there any way to resolve this, or at least figure out what the problem is?

mothur > cluster.split(fasta=pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=pv16s.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, taxonomy=pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=96)

After running the above command for a number of hours, the stdout produces the following quoteblock before terminating.


Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.

Clustering pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.26.dist

Clustering pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.14.dist

Clustering pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.27.dist

mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[39164,1],42]
Exit code: 1

The above doesn’t appear in the logfile. Instead, the last few lines are as follows, before it abruptly ends.

Running command: dist.seqs(fasta=pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.75.temp, processors=96, cutoff=0.155)

Using 96 processors.
/******************************************/

Output File Names:
pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.75.dist

It took 0 seconds to calculate the distances for 3 sequences.
It took 9165 seconds to split the distance file.

Reading pv16s.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.5.dist

I have attempted to run this on 1 processor. The job terminates with exit code 271 without any error messages, although both stdout and the logfile end with lines like the second quoteblock.

I suspect you’re using all the RAM and it’s crashing out. I’d encourage you to try using 4 processors instead of 96.

Pat