Running with mpi

I have a few questions about running mothur processes on different machines and mpi. After seeing some odd time results I decided to do a test with the Costello data. I have a Macbook Pro (2 cores, 4GB), a MacPro (dual quad- 8cores, 8GB RAM), and my little monster a Cray CX1 (7 nodes, dual qaud cores- 56 cores with 96 GB on the head node). I run RAxML and mpiBlast on the Cray and I’ve installed the Intel compilers and compiled openMPI with these.

I compiled the Macs with 64BIT_VERSION ?= yes and USEMPI ?= yes. Running the Costello analysis up to align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta, processors=2) or processors=8 for the MacPro gives me

It took 197 secs to align 15247 sequences.on the MacBook and

It took 152 secs to align 15247 sequences. on the MacPro. Faster, but not huge. On the MacPro I can only see one core really working, whilst the MacBook shows a more equal distribution on the cpu useage.

The Cray, I’m not so sure about. Compiling the program with 64BIT_VERSION ?= yes returns an error because the intel compiler doesn’t have a -arch x86_64 flag. Since it compiles 64 bit as a default I commented that out and then it compiles with a few warnings about changing the sign of an integer, but it completes the process. The Cray runs RHEL. I assume I don’t use the processors= option in the mothur command.

Running mpirun --hostfile /home/forsterr/test/hostfile_8_8 -np 28 mothur
(I can only use half the processors at the moment otherwise I flip the 20 amp circuit in the server room-but that’s another story!)
the program starts up 28 mothur processes and I get a mothur > prompt. Interestingly top shows them running at near 100% even though they aren’t calculating anything.

Running mothur > align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta)



Reading in the silva.bacteria.fasta template sequences… DONE.
Aligning sequences from stool.trim.unique.fasta …
[compute-00-00.private.dns.zone:12553] *** An error occurred in MPI_Bcast
[compute-00-00.private.dns.zone:12553] *** on communicator MPI COMMUNICATOR 4 DUP FROM 0
[compute-00-00.private.dns.zone:12553] *** MPI_ERR_TRUNCATE: message truncated
[compute-00-00.private.dns.zone:12553] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

mpirun has exited due to process rank 13 with PID 12556 on
node compute-00-00 exiting without calling “finalize”. This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

I have been able to get past this point with my own data set by copying over the files from the Mac and then running the rest of the analysis up to dist.seqs and cluster, however it doesn’t seem to run very fast, although the fans ramp up and I get a few temperature warnings. (It’s running cluster at the moment)

Should I be starting the mothur processes using the command line instead of interactive mode on commands that are mpi enabled (commands that have the processors= option)?
such as:
mpirun --hostfile /home/forsterr/test/hostfile_8_8 -np 28 mothur “#align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta)” ?

Should I be starting the mothur processes using the command line instead of interactive mode on commands that are mpi enabled (commands that have the processors= option)?
such as:
mpirun --hostfile /home/forsterr/test/hostfile_8_8 -np 28 mothur “#align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta)” ?

Yeah, that’s probably what you need to do for this particular command. In general, keep in mind that the only command in the Costello workflow up to where you are that uses multiple processors is align.seqs. So you should only see one processor in use except for while align.seqs is in use and that might be wicked quick.

Hope this helps, enjoy the little monster…
Pat