Time required for shhh.flows

forster · January 23, 2012, 9:31pm

I’m trying to use shhh.flows on some new data. There are 16 samples and about 20,000 sequences per sample. I’ve used shhh.flows in the past on smaller samples (about 5,000 sequences per sample) and it finished in a reasonable time.

I’m using the mpi version on 49 processors and it’s been running over a week on the first sample. It’s using virtually no memory and hasn’t written any results files. It’s still on “calculating distances between flowgrams…” for the first sample. Is this normal? What would be an upper limit of sequences for shhh.flows to finish in a reasonable time (2,3 or 5 days). Thanks

Kirk · January 24, 2012, 9:13am

Using an Intel E7300 Dual Core and only 2 GB of RAM with Windows XP, so not the most recent of computers, it takes about 25-30 min for 30.000 sequences …

jiaco · January 24, 2012, 1:36pm

I am using Mothur 1.23.0 and have also noticed a problem with the time that shhh.flows runs.
I do not remember it taking this much time and I suspect that the change is with the latest version, but I really cannot say for sure. I updated to fix a problem with some other component of Mothur.

I have just killed the job and am going to rerun it with a previous version. Should be able to let you know tomorrow if that fixed anything.

EDIT: so i just realized i had 1.23.0 not 1.23.1, dunno if I had better test with 1.23.1 or 1.22.2…

pschloss · January 24, 2012, 4:48pm

Yeah, that seems slow. Are you sure you’re giving it files=???.flow.files?

forster · January 24, 2012, 5:10pm

OK, it seems to be a problem with the mpi version. I killed the job and it’s now running fine on the regular version with 8 processors.

The mpi command I ran was

mpirun --hostfile host_8_8 -np 49 --bynode /share/apps/mothur/mothurMPI_1.22 “#shhh.flows(file=pyro03.flow.files)”

It did come back with

TERM environment variable not set.
TERM environment variable not set.
TERM environment variable not set.

Maybe that is causing some communication problems.

Bob

pschloss · January 24, 2012, 8:19pm

You should try it without mpi. We’ve actually noticed that the parallelization doesn’t really speed things up too much.

forster · January 24, 2012, 8:30pm

Right, that’s what I’m doing now. As an aside I posted this problem in the mothur bugs index last year and had forgotten about it! I did attempt to use the mpirun flag of -x TERM=linux, this gets rid of the TERM not set problem, but the program still hangs.

Bob

mattk · March 6, 2012, 8:20pm

I’ve also been trying the MPI version. Using the files from the SchlossSOP page with Mothur 1.23.1 on MacOS 10.6. The threaded version is much much faster than the MPI with more CPUs. 45 minutes using the 8-cores on one computer vs 6 hours so far (on file 4 of 11) with 16 cores on two computers.

Threaded command: ./mothur “#shhh.flows(file=GQY1XT001.flow.files, processors=8)”
MPI command: mpirun -np 16 -machinefile hostfile mothur-mpi “#shhh.flows(file=GQY1XT001.flow.files)”

Is the much slower result a problem with something I’m doing? Otherwise I’ll switch to using the non-MPI version.
Much thanks
Matt

pschloss · March 6, 2012, 9:11pm

I’d switch. Are you running MPI across multiple nodes? How are the nodes connected? If you are running it across multiple nodes then there can be a slow down in transferring the data back and forth. The next release of mothur in the next week or so, takes a different strategy to the parallelization, which should limit the data swapping.

Pat

mattk · March 8, 2012, 5:41pm

I am running across multiple nodes. They’re connected via gigabit ethernet. I’ll use the standard version now and try the new version when it’s released.
Thanks for the reply,
Matt

Topic		Replies	Views
shhh.seqs Commands in mothur	6	4137	July 10, 2012
time to run shhh.flows Commands in mothur	5	3541	July 11, 2014
parallel capabilities? Commands in mothur	3	4370	August 20, 2013
shhh.flows memory requirements Commands in mothur	5	4430	February 24, 2012
MPI version of shhh.flow mothur bugs	1	3002	April 3, 2012

Time required for shhh.flows

Related topics