mothur

Help with slurm script to run mothur in batch mode

Hi!
I have access to a cluster with 60GB RAM and 12 processors. I’ve been asked to not use the cluster in interactive mode. I never used mothur in batch mode, and less I used slurm or a cluster.

-I have my home folder in the main node, with 100Gb store for my files (/home/svazquez)
-I have to run mothur from anaconda3, they have the last version so I’m fine.
-The administrator placed mothur databases in a separate storage node (/share/databases/references/mothur)
-To have enough storage capacity during running, I should use a third compute node copying the files from my home folder into the /scratch forlder in this compute-0-1 node

  • The files with the raw reads should be stored in the storage node, in a different folder than the databases are (/rawdata/reads)

I don´t know how I should write the script for slurm and the batch file for mothur following the MiSeq SOP (or any pipeline working with 2x300 bp Miseq amplicon reads), to work with the different files stored in the different paths.

Can anyone help me with this? Please? :weary:

Thank you!!!

Susi

Hi Susi,
You’ll need to work with your cluster admin to get the finer points worked out, but here’s my .slurm script. You will also need to set the paths to your reference files appropriately in your batch file. I strongly recommend building your batch file on your PC first, getting that working, then uploading your batch and the SOP practice dataset to your cluster and using that until you’ve got everything functional. Once you’ve got everything pointed the right way, your batch file should work regardless of if you’re on a cluster or your PC. My .slurm file is below.
-Adam Mumford

#SBATCH --mail-type=ALL # choose when you want to be emailed BEGIN, END, FAIL, REQUEUE, and ALL
#SBATCH --mail-user=you@yourdomain.com # add your email address
#SBATCH -o %j-Mothur_NVME.out # name of output file (the %j inserts the jobid)
module load mothur/1.39.5-gcc # load required modules - if you want just the latest version
# use “module load mothur”
PROJECT_DIR="/cxfs/projects/usgs/path/to/projects"
CASE_DIR="/home/you/yourworkingdirectory"
CASE_FILE=“yourbatch.batch”

Let’s take advantage of CXFS and use the cxfscp command

This will only work on the UV partition and the Data Transfer nodes

It may take some testing to tune these options

This was tested with ~450 GB input data

CPCMD="/usr/cluster/bin/cxfscp -r -d -b 30 -t 30"
echo “Copying ${CASE_DIR} to Local Scratch”
CPCMD {PROJECT_DIR}/{CASE_DIR} {LOCAL_SCRATCH}
cd {LOCAL_SCRATCH}/{CASE_DIR}
sync
echo “Runnning Mothur”
srun mothur ./{CASE_FILE} EXIT_CODE=?
if [ {EXIT_CODE} -ne 0 ] then echo "Mothur Run Failed with exit code {EXIT_CODE} !!!"

We won’t exit here because we want to save what progress

we made and to diagnose the failure

else
echo “Finished Mothur run”
fi

We don’t need to copy stuff that is not import to the results

echo “Cleaning up directory”
rm -fr *.{temp,map,fq,zip}
rm -fr __MACOSX/
echo "Moving results to {PROJECT_DIR}/{CASE_DIR}{SLURM_JOBID}" cd {LOCAL_SCRATCH}
CPCMD {CASE_DIR} {PROJECT_DIR}/{CASE_DIR}
{SLURM_JOBID} exit {EXIT_CODE}

Hi Adam

  Thank you so much!!! I will try to figure it out with the cluster

admin and in case I have questions, I’ll write you back.

  One thing that I do not fully understand is, project_dir is what?

the folder with my raw reads files? or is it case_dir? In my case,
I have my working directory in my home directory, but the raw
reads files in a different location, and in a third one the silva
databases. How should I point to that? and is it here in the slurm
script or in the batch file?

Thank you!! Cheers,

susi

Hi Susi,
These are good questions, my responses will follow but your best bet will be to work these out with your cluster admin.
Generally, your project_dir will be the directory holding your raw data files and your mothur batch file. In my case, this is a subdirectory of my home directory. My .slurm script is set up to copy this to the case_dir in the cluster: on the cluster I use this is done for speed and efficiency—work with your cluster admin to determine where this is and how best to do it (if your cluster uses it at all). You’ll point to the location of your silva databases within your mothur script. As an example, here is the classify.seqs call from one of my mothur .batch files:
classify.seqs(fasta=current, count=current, reference=~/Working_Silva_Files/silva.nr_v128_v4.align, taxonomy=~/Working_Silva_Files/silva.nr_v128.tax, cutoff=80, probs=F)
Good Luck!
-Adam

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.