mothurMPI and shhh.flows

Hi,

I apologize if this topic has been covered in other threads, but I have scoured the forum and cannot make this work. I am trying to run shhh.flows on a computing cluster with 250 Gb RAM and 48 CPUs.

First and foremost, I am wondering if MPI mothur is compiling correctly. I am compiling version 1.27.0 with gcc version 4.4.6 (for Red Hat 4.4.6-3) with usempi=yes. It seems to be compiling without any errors, however, I don’t see the mothurMPI executable as is mentioned in Schloss SOP. Instead, I am just getting a regular old mothur executable. Am I missing something? Here are the contents of my makefile:

###################################################

Makefile for mothur

Created: June 29, 2010

###################################################

Macros

USEMPI ?= yes
64BIT_VERSION ?= yes
USEREADLINE ?= yes
CYGWIN_BUILD ?= no
USECOMPRESSION ?= no
MOTHUR_FILES="“Enter_your_default_path_here”"
RELEASE_DATE = "“8/8/2012"”
VERSION = "“1.27.0"”
FORTAN_COMPILER = gfortran
FORTRAN_FLAGS =

Optimize to level 3:

CXXFLAGS += -O3

ifeq ($(strip $(64BIT_VERSION)),yes)
#if you are using centos uncomment the following lines
#CXX = g++44

#if you are a mac user use the following line
#TARGET_ARCH += -arch x86_64

#if you using cygwin to build Windows the following line
#CXX = x86_64-w64-mingw32-g++
#CC = x86_64-w64-mingw32-g++
#FORTAN_COMPILER = x86_64-w64-mingw32-gfortran
#TARGET_ARCH += -m64 -static

#if you are a linux user use the following line
CXXFLAGS += -mtune=native -march=native -m64

CXXFLAGS += -DBIT_VERSION
FORTRAN_FLAGS = -m64
endif


CXXFLAGS += -DRELEASE_DATE=${RELEASE_DATE} -DVERSION=${VERSION}

ifeq ($(strip $(MOTHUR_FILES)),"“Enter_your_default_path_here”")
else
CXXFLAGS += -DMOTHUR_FILES=${MOTHUR_FILES}
endif

ifeq ($(strip $(CYGWIN_BUILD)),yes)
CXXFLAGS += -mno-cygwin
LDFLAGS += -mno-cygwin
endif

if you do not want to use the readline library, set this to no.

make sure you have the library installed


ifeq ($(strip $(USEREADLINE)),yes) CXXFLAGS += -DUSE_READLINE LIBS = \ -lreadline\ -lncurses endif
ifeq ($(strip $(USEMPI)),yes) CXX = mpic++ CXXFLAGS += -DUSE_MPI endif

if you want to enable reading and writing of compressed files, set to yes.

The default is no. this may only work on unix-like systems, not for windows.


ifeq ($(strip $(USECOMPRESSION)),yes) CXXFLAGS += -DUSE_COMPRESSION endif

INCLUDE directories for mothur

CXXFLAGS += -I.

Get the list of all .cpp files, rename to .o files

OBJECTS=$(patsubst %.cpp,%.o,$(wildcard *.cpp))
OBJECTS+=$(patsubst %.c,%.o,$(wildcard *.c))
OBJECTS+=$(patsubst %.f,%.o,$(wildcard *.f))

mothur : fortranSource $(OBJECTS) uchime
$(CXX) $(LDFLAGS) $(TARGET_ARCH) -o $@ $(OBJECTS) $(LIBS)

strip mothur

uchime:
cd uchime_src && ./mk && mv uchime … && cd …

fortranSource:
${FORTAN_COMPILER} -c $(FORTRAN_FLAGS) *.f

install : mothur

cp mothur …/Release/mothur

%.o : %.c %.h
$(COMPILE.c) $(OUTPUT_OPTION) $<
%.o : %.cpp %.h
$(COMPILE.cpp) $(OUTPUT_OPTION) $<
%.o : %.cpp %.hpp
$(COMPILE.cpp) $(OUTPUT_OPTION) $<


clean : @rm -f $(OBJECTS) @rm -f uchime

Since there was no mothurMPI, I ran this command to see what would happen. There are 6 flow files containing ~20,000 sequences each after trim.flows.

mpirun -np 48 /data0/opt/Metagenomics/mothur/mothurbase/mothur/mothur “#shhh.flows(file=/data2/outsideusers/escully/gutcommunity/HUI9KWB02.flow.files)”

And I get segmentation fault 11 error during the flowgram clustering step. It does not appear that the system is running out of memory since it is using only 40 Gb right before it crashes, so I am wondering if the program is compiled correctly to use MPI? I have also tried with less processors, ranging from 8 to 24 and I get the same error during the flowgram clustering step.

I also tried nonmpi: shhh.flows(file=HUI9KWB02.flow.files, processors=48) It does not crash, but the denoising step is soooooooooooooooo painfully slow. I am talking like 2 weeks to process one flow file.

A second question: could my amplicons be too long for the denoising step (they are ~900 bp)? I have had no issues processing files containing ~20,000 sequences on my workstation with only 24 Gb RAM, but these were ~500 bp and I was able to denoise these files within a couple of days.

Any suggestions for improving this would be most appreciated! Thanks so much!

Erin

Hi there,
I have a similar problem : when running the command I got a "problem with execution of ./mothurMPI on biogenomics: [Errno 2] No such file or directory "
Could that be that the 1.27 source file does not install the mpi command anymore …?

The executable created will be called mothur regardless of the makefile settings. Version 1.27 is still mpi-enabled, but it sounds like you may have hit a bug in the shhh.flows implementation of that. I will take a look. In general if you have running mothur on a single machine our non-mpi enabled version is MUCH faster. Have you tried running the executable version of mothur with shhh.flows(file=HUI9KWB02.flow.files, processors=48) or is that what you meant when you said I also tried nonmpi: shhh.flows(file=HUI9KWB02.flow.files, processors=48). Running the a version built with USEMPI ?= yes without mpirun is still slower than running a version with USEMPI ?= no.