I have a few questions about running mothur processes on different machines and mpi. After seeing some odd time results I decided to do a test with the Costello data. I have a Macbook Pro (2 cores, 4GB), a MacPro (dual quad- 8cores, 8GB RAM), and my little monster a Cray CX1 (7 nodes, dual qaud cores- 56 cores with 96 GB on the head node). I run RAxML and mpiBlast on the Cray and I’ve installed the Intel compilers and compiled openMPI with these.
I compiled the Macs with 64BIT_VERSION ?= yes and USEMPI ?= yes. Running the Costello analysis up to align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta, processors=2) or processors=8 for the MacPro gives me
It took 197 secs to align 15247 sequences.on the MacBook and
It took 152 secs to align 15247 sequences. on the MacPro. Faster, but not huge. On the MacPro I can only see one core really working, whilst the MacBook shows a more equal distribution on the cpu useage.
The Cray, I’m not so sure about. Compiling the program with 64BIT_VERSION ?= yes returns an error because the intel compiler doesn’t have a -arch x86_64 flag. Since it compiles 64 bit as a default I commented that out and then it compiles with a few warnings about changing the sign of an integer, but it completes the process. The Cray runs RHEL. I assume I don’t use the processors= option in the mothur command.
Running mpirun --hostfile /home/forsterr/test/hostfile_8_8 -np 28 mothur
(I can only use half the processors at the moment otherwise I flip the 20 amp circuit in the server room-but that’s another story!)
the program starts up 28 mothur processes and I get a mothur > prompt. Interestingly top shows them running at near 100% even though they aren’t calculating anything.
Running mothur > align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta)
Reading in the silva.bacteria.fasta template sequences… DONE.
Aligning sequences from stool.trim.unique.fasta …
[compute-00-00.private.dns.zone:12553] *** An error occurred in MPI_Bcast
[compute-00-00.private.dns.zone:12553] *** on communicator MPI COMMUNICATOR 4 DUP FROM 0
[compute-00-00.private.dns.zone:12553] *** MPI_ERR_TRUNCATE: message truncated
[compute-00-00.private.dns.zone:12553] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
mpirun has exited due to process rank 13 with PID 12556 on
node compute-00-00 exiting without calling “finalize”. This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
I have been able to get past this point with my own data set by copying over the files from the Mac and then running the rest of the analysis up to dist.seqs and cluster, however it doesn’t seem to run very fast, although the fans ramp up and I get a few temperature warnings. (It’s running cluster at the moment)
Should I be starting the mothur processes using the command line instead of interactive mode on commands that are mpi enabled (commands that have the processors= option)?
such as:
mpirun --hostfile /home/forsterr/test/hostfile_8_8 -np 28 mothur “#align.seqs(candidate=stool.trim.unique.fasta, template=silva.bacteria.fasta)” ?