Problem with MPI version shhh.flows

I compiled the MPI version of mothur with gcc 4.6.0 and openmpi 1.4.4. After a few hiccups with the libstdc++ libraries I was able to get mothurMPI running. I tried it on one sample using this command

 /share/apps/mpi/gcc460/openmpi-1.4.3/bin/mpirun --hostfile host_8_8 -np 28 --bynode /share/apps/mothur/mothurMPI "#shhh.flows(file=DNA_2.flow.files)"

It starts and gets to this stage:

TERM environment variable not set.
TERM environment variable not set.
TERM environment variable not set.
Unable to open LookUp_Titanium.pat. Trying mothur's executable location /share/apps/mothur/mothurLookUp_Titanium.pat

Getting preliminary data...

>>>>> Processing DNA_2.LB3-1.flow (file 1 of 1) <<<<<
Reading flowgrams...
Identifying unique flowgrams...
Calculating distances between flowgrams...

mothurMPI starts as planned on each node, runs at 100% cpu and doesn’t finish. I tried it with the same files on my mac with 8 processors, without mpi, and it finishes in a few seconds.

I then quit the process on the linux box and tried this command, to use just the processors on one node.

/share/apps/mpi/gcc460/openmpi-1.4.3/bin/mpirun --hostfile host_8_8 -np 8  /share/apps/mothur/mothurMPI "#shhh.flows(file=DNA_2.flow.files)"

Amazingly the run finishes in a second. So, it looks like there is a problem with the first command and the TERM environment. I can run other mpi programs, like raxml and mpiBLAST with no problems. Anyone have any ideas as to how to solve this?

Are you having the same issue with mothur’s other mpi-enabled commands?

Hmm I 'm not sure, the shhh.flows command was the only one that I’ve run in MPI mode in a while. I’ll test another one and see.

I did get around this problem by running trim.flows and then using a qsub script to submit each shhh.flows command on each sample file to an individual node, with the number of processes set to a max of 8 (8 cores per node) and jobs not to span nodes. I then concatenate the results and go on my merry way. This seems to be the best way to speed up the shhh.flows procedure, as each node is working on an individual sample at the same time. It really makes a difference when there are over 10,000 sequences in each sample file, with 3,000 sequences the procedure seems pretty fast doing it serially without mpi.

Bob

Is this still an issue in 1.27?

I have compiled version with GCC and OpenMPI. The program gets to:

Calculating distances between flowgrams...

and then hangs with the same symptoms described above.

If I use strace to see what is happening, the program is polling a bunch of file descriptors which all point to sockets.

Any ideas?

Cheers

Loris

PS: I’m an IT guy installing the software on a cluster without any understanding of microbial ecology.

Similar problem happens here too. I compiled Mothur using openmpi 1.4.3 and my OS is Ubuntu 10.04 LTS. When I launch mohtur using following command :

/bio/openmpi-1.4.3/bin/mpirun --hostfile hosts -np 4 /bio/bin/mothur

The output is like following:

TERM environment variable not set.
TERM environment variable not set.
TERM environment variable not set.



mothur v.1.27.0 Last updated: 8/8/2012

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program



mothur >

Then I issue the following command:

mothur > fastq.info(fastq=f200.fastq)
fastq.info(fastq=f200.fastq)


Output File Names: f200.fasta f200.qual

It works just fine. Then I run a command using 4 processors:

mothur > summary.seqs(fasta=f200.fasta, processors=4)
summary.seqs(fasta=f200.fasta, processors=4)

Using 4 processors.

Then the program just hangs there without returning to the command line prompt.

When I launch Mothur on just one node without using mpirun, the program works fine and it can use all the 24 processors on the node. For my datasets, using 8 processors is fast enough, so I do not need to go for more than 24 processors by launching Mothur using multiple nodes.

Has anybody found out why this is happening and any ways to fix it? Thanks!

This is an older post, which command are you having trouble with? Are you using our current version 1.31.1?

Hi,

I am running:

mpirun -np 8 "#set.dir(input=./fasta/by_primer_denoised/flows, output=./fasta); shhh.flows(file=H2Y6SP102.flow.files, order=B)"

its hanging after:

Reading flowgrams...
Identifying unique flowgrams...
Calculating distances between flowgrams...

with mothur v.1.30.2
Was this fixed in v1.31.1?

Thanks!

Can you try it without the set.dir command? How long is it “hanging”?

Hi, I’ve tried it with out the set.dir and it works!
I guess I could just change directories before hand, thanks!

I will fix that issue for our next release. The set.dir command was only allowing the main process to see the directory change, and so the other process can’t find the files. Thanks for bringing this issue to our attention.

Hi Sarah,

I’m using the latest mpi-enabled version (1.34.4) of mothur for shhh.flows. I can solve this issue by removing set.dir as you mentioned earlier on this thread. Is it correct to assume that the fix has not been updated on the later versions of mothur?

Thank you.

Daniel