multiple processors and mpi

I have a few questions as far as which compiled version of mothur is appropriate to run. As far as I can tell, the commands that can use multiple processors (multiple cores) are align.seqs, chimera.slayer, chimera.uchime, classify.seqs, dist.seqs, filter.seqs, screen.seqs, and trim.seqs. From a note in the forum, trim.seqs will run with multiple cores, but not with mpi enabled, is this true for all the commands?

When I input any commands using an mpi-enabled build of mothur on my linux machine I never see it using more than one processor or thread. When run on my mac it seems to use multiple processors. Would a build of mothur with mpi not enabled, allow it to run on multiple cores on a linux box?

Which commands would be able to use mpi, launched from a mpirun command line or a shell script using sge?

Thanks, Bob

Interestingly, I ran a screen.seqs command on my dataset and at the beginning of the execution it did indeed successfully “fork” into 8 processes for a short time.

An MPI process has executed an operation involving a call to the
“fork()” system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

Local host: (PID 18476)

If you are absolutely sure that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.

Question 1 - Which commands can use multiple processors?

align.seqs, chimera.bellerphon, chimera.ccode, chimera.check, chimera.pintail, chimera.slayer, chimera.uchime, classify.seqs, cluster.split, dist.seqs, filter.seqs, indicator, dist.shared, metastats, pairwise.seqs, parsimony, phylo.diversity, rarefaction.single, screen.seqs, summary.seqs, summary.shared, trim.seqs, unifrac.unweighted and unifrac.weighted.

Question 2 - Which of the multiple processors commands are MPI enabled?

pairwise.seqs, classify.seqs, dist.seqs, filter.seqs, align.seqs, chimera.ccode, chimera.check, chimera.slayer, chimera.uchime, chimera.pintail, chimera.bellerophon, screen.seqs, summary.seqs and cluster.split

Question 3 - MPI and fork()

This is an issue we will correct. Fork() is what mothur uses for the mac and linux non-mpi-enabled paralellization, and should not be in the mpi-enabled version.

I am not sure why you are only seeing 1 thread on your linux box. How are you launching mothur on the linux machine? Are you using mpirun? If you are running mothur on a single machine with multiple cores, it runs faster with MPI NOT enabled. You can just set processors=numberOfCores. I hope this helps.



thank you very much, yes that helps.

I’ll compile two versions (one mpi, one not) with slightly different names and invoke them as appropriate.


Just an update: We have resolved the issue with the fork() command in the MPI-Enabled version of the screen.seqs command. The change will be part of version 1.23.0.