align.seqs within .pbs script / ssh job request

Hey everyone,

Trying to run align.seqs on my data, using the silva.nr_v128.align file for a 16S rRNA gene data-set via WestGrid’s bugaboo computer cluster to perform the align.seqs command and other memory-intensive commands.
https://www.westgrid.ca/support/running_jobs
https://www.westgrid.ca/support/quickstart/bugaboo

Command is:

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=../mothur/mothur/silva.nr_v128.align, processors=12)

I’m wondering what to do next with the script / job request; i.e. add the above sequence to run as mothur in ‘command line’ mode in the script below?
Script (mothur.pbs):

#!/bin/bash
#PBS -S /bin/bash
##Script 4 running mothur module 1.39.5 on the bugaboo@WestGrid.ca

#PBS -l nodes=1:ppn=12, walltime=48:00:00, pmem=2000mb,mem=30000mb
#PBS_O_WORKDIR

echo "Current WD is `pwd`"
echo "running on hostname `hostname`
echo "Captain's Log: `date`

module load mothur/1.39.5
mothur

Am I better off doing the memory requests as part of the qsub command, and just make a .pbs file with my mothurly needs described inside?

The server uses a shared memory version of mothur, which is limited to 12 processors per node. Is it worthwhile to compile an MPI version of mothur on the server and run an mpi-enabled .pbs script (to use 20+ processors?)? Is this task too memory intensive to use mothur in interactive mode, which has longer queues and limited walltimes for resources? I have about 70 paired fwd/rev fastq files (16S rRNA gene sequencing data for V4V5 hypervariable region).

Any help is greatly appreciated!

Hi,

The MPI version is pretty worthless.

The mothur line in your script should probably be something like this…

mothur "#align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=../mothur/mothur/silva.nr_v128.align, processors=12)"

Pat