Hey everyone,
Trying to run align.seqs on my data, using the silva.nr_v128.align file for a 16S rRNA gene data-set via WestGrid’s bugaboo computer cluster to perform the align.seqs command and other memory-intensive commands.
https://www.westgrid.ca/support/running_jobs
https://www.westgrid.ca/support/quickstart/bugaboo
Command is:
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=../mothur/mothur/silva.nr_v128.align, processors=12)
I’m wondering what to do next with the script / job request; i.e. add the above sequence to run as mothur in ‘command line’ mode in the script below?
Script (mothur.pbs):
#!/bin/bash
#PBS -S /bin/bash
##Script 4 running mothur module 1.39.5 on the bugaboo@WestGrid.ca
#PBS -l nodes=1:ppn=12, walltime=48:00:00, pmem=2000mb,mem=30000mb
#PBS_O_WORKDIR
echo "Current WD is `pwd`"
echo "running on hostname `hostname`
echo "Captain's Log: `date`
module load mothur/1.39.5
mothur
Am I better off doing the memory requests as part of the qsub command, and just make a .pbs file with my mothurly needs described inside?
The server uses a shared memory version of mothur, which is limited to 12 processors per node. Is it worthwhile to compile an MPI version of mothur on the server and run an mpi-enabled .pbs script (to use 20+ processors?)? Is this task too memory intensive to use mothur in interactive mode, which has longer queues and limited walltimes for resources? I have about 70 paired fwd/rev fastq files (16S rRNA gene sequencing data for V4V5 hypervariable region).
Any help is greatly appreciated!