How to run cluster.split on HPC

jaya · March 31, 2022, 10:08am

I am going to run the second step of cluster.split on my IBM HPC using the following script:

#!/bin/bash
#BSUB -R “rusage[mem=800GB]”
./mothur “#cluster.split(file=final.file, count= final.count_table, cutoff=0.03, processors=10)”

Here is the output of lshosts on my HPC: I can use hpc-cmp-01 and hpc-cmp-02 for my computing work.

HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hpc-mgt-01 LINUXPP POWER8 250.0 20 128G 31.9G Yes (mg)
hpc-mgt-02 LINUXPP POWER8 250.0 20 128G 31.9G Yes (mg)
hpc-cmp-01 LINUXPP POWER8 250.0 20 1T 31.9G Yes ()
hpc-cmp-02 LINUXPP POWER8 250.0 20 1T 31.9G Yes ()
hpc-cmp-03 LINUXPP POWER8 250.0 40 314.5G 31.9G Yes ()
hpc-cmp-04 LINUXPP POWER8 250.0 40 314.5G 31.9G Yes ()

Alexandre_Thibodeau · March 31, 2022, 2:28pm

Hello ,this is what I am doing on my server. The module I am loading are those required to be able to call Mothur correcly within my working environment, it may change for you. I would use “current” in the batch file instead of the complete path + file name, it will save you errors. Hope it helps,

#!/bin/bash

#SBATCH --time=24:00:00
#SBATCH --account=def-myaccount
#SBATCH --mem=128000M
#SBATCH --mail-user=me@umontreal.ca
#SBATCH --cpus-per-task=32
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=REQUEUE
#SBATCH --mail-type=ALL
#SBATCH --output=projectname.out

cd $SCRATCH/path where my data are

module purge

module load gcc/9.3.0
module load mothur/1.47.0
module load vsearch/2.15.2

mothur myproject.batch

seff $SLURM_JOBID

sstat $SLURM_JOBID

pschloss · March 31, 2022, 6:37pm

800GB - That’s a lot of memory! What problem are you running into? Are you using slurm or pbs?

Pat

jaya · April 1, 2022, 5:53am

Hello, could you please clarify if it is mandatory to have vsearch executable while running the step cluster.split? I don’t have it in my HPC where I am running this particular step.

jaya · April 1, 2022, 6:00am

I am using IBM machine with LSF (just like PBS/SLURM) for resource management. I have already got all the “final.93.opti_mcc.list” type of files. The process is still running since the last 24 hours and I don’t know when is it going to finish. (Please note I don’t have vsearch executable in my folder.) Here is what I get about my job.

bjobs -l 1154

Job <1154>, User , Project , Status , Queue , Command <
#!/bin/bash;#BSUB -R “rusage[mem=800GB]”;./mothur “#cluste
r.split(file=final.file, count=final.count_table, cutoff=0
.03, processors=10)”>, Share group charged
Thu Mar 31 17:14:52: Submitted from host , CWD <$HOME/simplestat/cl
ustersplit>, Requested Resources <rusage[mem=819200.00]>;
Thu Mar 31 17:14:52: Started 1 Task(s) on Host(s) , Allocated 1 Slo
t(s) on Host(s) , Execution Home </home/ibm>,
Execution CWD </home/ibm/simplestat/clustersplit>;
Fri Apr 1 11:27:38: Resource usage collected.
The CPU time used is 148675 seconds.
MEM: 757.3 Gbytes; SWAP: 0 Mbytes; NTHREAD: 5
PGID: 148682; PIDs: 148682 148683 148687 148688

MEMORY USAGE:
MAX MEM: 757.3 Gbytes; AVG MEM: 574 Gbytes

GPFSIO DATA:
READ: ~0 bytes; WRITE: ~0 bytes

SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -

RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[r15s:pg] rusage[mem=819200.00]
Effective: select[type == local] order[r15s:pg] rusage[mem=819200.00]

Alexandre_Thibodeau · April 1, 2022, 12:34pm

I use vsearch for chimera detection.

Alexandre_Thibodeau · April 1, 2022, 12:36pm

So the question: what is in your “final” files? Can you post the summary?

jaya · April 1, 2022, 1:00pm

I have already used vsearch for chimera search (I performed that step on a different machine).

pschloss · April 5, 2022, 4:38pm

I suspect your problem is poor quality data. If your reads don’t fully overlap (e.g. 2x250 to sequence the V4 region) or if you had a bad sequence run you are likely to see results like you have.

Topic		Replies	Views
cluster and cluster.split crash Commands in mothur	4	997	November 18, 2016
Cluster.split and computer characteristics	7	1845	October 23, 2019
cluster.split issues Commands in mothur	1	1550	May 25, 2015
cluster.split error mothur bugs	2	2215	March 14, 2015
Use cluster.split on MiSeq data Commands in mothur	15	13896	May 9, 2013

How to run cluster.split on HPC

Related topics