I am having some problems with the MPI version of mothur.
Here at the Finnish supercomputing center (CSC) we have been testing mothur for a researcher who would like to run mothur command: pairwise.seqs for a very large dataset.
Using the mpi version of mothur looks like the only option, if we want to run the analysis in reasonable time. As the all-against-all pairwise sequence alignment can be easily parallelized, the command actually scales nicely to over 1000 cores in our Cary XC30 system.
However, it looks like that if we use more than four cores, part of the results is lost and the distance matrix is incomplete. There are no error messages in mothur log file, and the results that find their way to the distance matrix file are correct, but not all the distances are found in the result file.
Further, the error is not systematic, some times more results are missing, sometimes less. Also when we increase the amount of computing cores to be used, the tendency of loosing results seems to increase.
We have observed the same behavior, both in our Cray XC30 supercomputer and in our HP SL230s G8 cluster.
Any ideas how we could fix this?
Kimmo Mattila / CSC