Time required for shhh.flows

I’m trying to use shhh.flows on some new data. There are 16 samples and about 20,000 sequences per sample. I’ve used shhh.flows in the past on smaller samples (about 5,000 sequences per sample) and it finished in a reasonable time.

I’m using the mpi version on 49 processors and it’s been running over a week on the first sample. It’s using virtually no memory and hasn’t written any results files. It’s still on “calculating distances between flowgrams…” for the first sample. Is this normal? What would be an upper limit of sequences for shhh.flows to finish in a reasonable time (2,3 or 5 days). Thanks

Using an Intel E7300 Dual Core and only 2 GB of RAM with Windows XP, so not the most recent of computers, it takes about 25-30 min for 30.000 sequences …

I am using Mothur 1.23.0 and have also noticed a problem with the time that shhh.flows runs.
I do not remember it taking this much time and I suspect that the change is with the latest version, but I really cannot say for sure. I updated to fix a problem with some other component of Mothur.

I have just killed the job and am going to rerun it with a previous version. Should be able to let you know tomorrow if that fixed anything.

EDIT: so i just realized i had 1.23.0 not 1.23.1, dunno if I had better test with 1.23.1 or 1.22.2…

Yeah, that seems slow. Are you sure you’re giving it files=???.flow.files?

OK, it seems to be a problem with the mpi version. I killed the job and it’s now running fine on the regular version with 8 processors.

The mpi command I ran was

mpirun --hostfile host_8_8 -np 49 --bynode /share/apps/mothur/mothurMPI_1.22 “#shhh.flows(file=pyro03.flow.files)”

It did come back with

TERM environment variable not set.
TERM environment variable not set.
TERM environment variable not set.

Maybe that is causing some communication problems.

Bob

You should try it without mpi. We’ve actually noticed that the parallelization doesn’t really speed things up too much.

Right, that’s what I’m doing now. As an aside I posted this problem in the mothur bugs index last year and had forgotten about it! I did attempt to use the mpirun flag of -x TERM=linux, this gets rid of the TERM not set problem, but the program still hangs.

Bob

I’ve also been trying the MPI version. Using the files from the SchlossSOP page with Mothur 1.23.1 on MacOS 10.6. The threaded version is much much faster than the MPI with more CPUs. 45 minutes using the 8-cores on one computer vs 6 hours so far (on file 4 of 11) with 16 cores on two computers.

Threaded command: ./mothur “#shhh.flows(file=GQY1XT001.flow.files, processors=8)”
MPI command: mpirun -np 16 -machinefile hostfile mothur-mpi “#shhh.flows(file=GQY1XT001.flow.files)”

Is the much slower result a problem with something I’m doing? Otherwise I’ll switch to using the non-MPI version.
Much thanks
Matt

I’d switch. Are you running MPI across multiple nodes? How are the nodes connected? If you are running it across multiple nodes then there can be a slow down in transferring the data back and forth. The next release of mothur in the next week or so, takes a different strategy to the parallelization, which should limit the data swapping.

Pat

I am running across multiple nodes. They’re connected via gigabit ethernet. I’ll use the standard version now and try the new version when it’s released.
Thanks for the reply,
Matt