I’m trying to set up a slurm .sh file to run the cluster.split function of Mothur linux 7 version 1.46.1 onto a cluster (9 nodes, 44 CPUs, 777GB RAM).
I have tested two different kinds of scripts so far; one that supposes a distributed memory and the other one a shared memory configuration for multiprocessing.
Here is a part of the distributed memory configuration I tried:
[…] #SBATCH -n 25 #SBATCH–ntasks-per-node=25 #SBATCH --mem=190000
[…]
All the 25 processors and 140GB RAM (of the 190GB reserved) were used during this run that leads to a failure of the cluster.split function with the following error message: [ERROR]: Could not open stability.trim.contigs.trim.good.unique.good.filter.unique.precluster.pick.renamed.pick.0.dist (the largest one).
This error showed up in the mothur logfile several times while the others temp.dist files were being processed smoothly.
The shared memory configuration was the following:
[…] #SBATCH -n 1 #SBATCH -c 25 #SBATCH --mem=190000
[…]
In this case, the cluster.split command only used 15 processors (on the 25 reserved) and 10GB RAM, and appeared to run slower. Much slower actually when run on a smaller cluster (20 cores, 125GB). That leads me to think that this configuration could be optimized.
Considering all this, my question is, what would be the best way to set up the slurm parameters?
the limit in our server is 4G per node (that is actually what also @Alexandre_Thibodeau has: 128G in 32 cpus), but we request it in bulk (not per cpu). Not sure if this is something that the people handling the system prefer, or just is open but they only included that option in the examples they distribute