Running mothur in hpc with conda

Hi,

I am trying to run mothur in the HPC of my school but it doesnt work when creating a batch file.
I tried the interactive mode but code fails at make.contigs(file=current, processors=12) it gives segmentation fault error.
I tried the conda install because my HPC loaded mode doesnt have the gz type file process capability

Hi - there’s a lot that could go wrong here. Can you share what you mean by the “code fails”? What error message are you getting? When you run it in interactive mode, are you running it on the head node or a compute node? What does the batch file look like? How are you running it?

If you can give more details I can try to help

Thanks,
Pat

Hi,

I am running the code in a compute node- for the environment my school offers for HPC I first created a conda environment

$ mamba create --prefix /project/smithada_877/mothur

and installed mothur using conda install.

$ conda install -c bioconda mothur

I requested a compute node to run mothur without sending a job to the HPC, for which I requested a compute node following:

 $ salloc --time=1:00:00 --cpus-per-task=12 --mem=16G

then I activated the environment where conda was installed

$ mamba activate /project/smithada_877/mothur

within that environment I run the following code::

$ mothur stability.batch

this is the logfile info:

Linux version

Using ReadLine

Using Boost

Running 64Bit Version

mothur v.1.40.4

Last updated: 10/04/2023

by

Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan

http://www.mothur.org

When using, please cite:

Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://www.mothur.org/forum

Type 'quit()' to exit program

Batch Mode

mothur > pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F)

Using 16 processors.

935

935

935

934

935

935

935

934

935

934

935

935

935

935

935

934

[NOTE]: no sequences were bad, removing silva.bacteria.bad.accnos

It took 4 secs to screen 14956 sequences.

Output File Names:

silva.bacteria.pcr.fasta

mothur > rename.file(input=silva.bacteria.pcr.fasta, new=silva.v4.fasta)

Current files saved by mothur:

fasta=silva.bacteria.pcr.fasta

processors=16

mothur > set.logfile(name=test_conda)

mothur > set.logfile(name=test_conda)

mothur > make.file(inputdir=/project/smithada_877/bianca/testmothurbatch/testconda, type=gz, prefix=stability)

Setting input directory to: /project/smithada_877/bianca/testmothurbatch/testconda/

Output File Names:

/project/smithada_877/bianca/testmothurbatch/testconda/stability.files

mothur > make.contigs(file=stability.files)

Using 16 processors.

Segmentation fault

Here are the contents of my stability.batch file:

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F)

rename.file(input=silva.bacteria.pcr.fasta, new=silva.v4.fasta)

#This is the Standard Operating Procedure for analysis in the Schloss Lab

set.logfile(name=test_conda)

make.file(inputdir=/project/smithada_877/bianca/testmothurbatch/testconda, type=gz, prefix=stability)

make.contigs(file=stability.files)

screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275)

unique.seqs()

count.seqs(name=current, group=current)

align.seqs(fasta=current, reference=/project/smithada_877/bianca/testmothurbatch/testconda/silva.v4.fasta)

screen.seqs(fasta=current, count=current, start=1968, end=11550, maxhomop=8)

filter.seqs(fasta=current, vertical=T, trump=.)

unique.seqs(fasta=current, count=current)

pre.cluster(fasta=current, count=current, diffs=2)

chimera.vsearch(fasta=current, count=current, dereplicate=t)

remove.seqs(fasta=current, accnos=current)

classify.seqs(fasta=current, count=current, reference=/project/smithada_877/bianca/testmothurbatch/testconda/trainset9_032012.pds.fasta, taxonomy=/project/smithada_877/bianca/testmothurbatch/testconda/trainset9_032012.pds.tax, cutoff=80)

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15)

make.shared(list=current, count=current, label=0.03)

classify.otu(list=current, count=current, taxonomy=current, label=0.03)

phylotype(taxonomy=current)

make.shared(list=current, count=current, label=1)

classify.otu(list=current, count=current, taxonomy=current, label=1)

Thank you so much!

Please let me know if you need more info.

Thank you!

Thanks - can you post the contents of stability.files?

Hi,
Here are the contents of stability.files:

D165 D165_S174_L001_R1_001.fastq.gz D165_S174_L001_R2_001.fastq.gz
D166 D166_S175_L001_R1_001.fastq.gz D166_S175_L001_R2_001.fastq.gz
D167 D167_S176_L001_R1_001.fastq.gz D167_S176_L001_R2_001.fastq.gz
D168 D168_S177_L001_R1_001.fastq.gz D168_S177_L001_R2_001.fastq.gz
D169 D169_S178_L001_R1_001.fastq.gz D169_S178_L001_R2_001.fastq.gz
D170 D170_S179_L001_R1_001.fastq.gz D170_S179_L001_R2_001.fastq.gz
D171 D171_S180_L001_R1_001.fastq.gz D171_S180_L001_R2_001.fastq.gz
D172 D172_S181_L001_R1_001.fastq.gz D172_S181_L001_R2_001.fastq.gz

thank you for the fast response!

Could you possibly try running…

make.contigs(rfastq=D165_S174_L001_R1_001.fastq.gz D165_S174_L001_R2_001.fastq.gz, rfastq=D165_S174_L001_R1_001.fastq.gz D165_S174_L001_R2_001.fastq.gz)

If that doesn’t work, could you decompress those two files and then repeat the command dropping the .gz file extensions from their names?

Pat

Should I run that command in interactive mode after the make.file command? Or should I modify the stability.batch file and run just up to the make.contigs command?

thank you!

I run it in interactive mode and kept the .gz extension file and got this error
[ERROR]: The file, ffastq and rfastq or ffasta and rfasta parameters are required.

[ERROR]: If you provide use the rfastq, you must provide a ffastq file.

Using 24 processors.

[ERROR]: did not complete make.contigs.

Then I tried decompressing the gz files and in interactive mode I runt the following:

make.file(inputdir=/project/smithada_877/bianca/testmothurbatch/testconda, type=fastq, prefix=stability)

Setting input directory to: /project/smithada_877/bianca/testmothurbatch/testconda/

Output File Names:

/project/smithada_877/bianca/testmothurbatch/testconda/stability.files

mothur > make.contigs(rfastq=D165_S174_L001_R1_001.fastq D165_S174_L001_R2_001.fastq, rfastq=D165_S174_L001_R1_001.fastq D165_S174_L001_R2_001.fastq)

[ERROR]: The file, ffastq and rfastq or ffasta and rfasta parameters are required.

[ERROR]: If you provide use the rfastq, you must provide a ffastq file.

Using 24 processors.

[ERROR]: did not complete make.contigs.

Sorry, I botched that, try this…

make.contigs(ffastq=D165_S174_L001_R1_001.fastq.gz, rfastq=D165_S174_L001_R2_001.fastq.gz)

Can you run that in interactive mode? It would be easiest to run that from within the directory that contains those files.

Pat

Hi Pat,

I run the following and still got an error

mothur > make.contigs(ffastq=D165_S174_L001_R1_001.fastq.gz, rfastq=D165_S174_L001_R2_001.fastq.gz)

Using 24 processors.
Segmentation fault

Thanks - two things to try…

  1. Can you use processors=1 in make.contigs? Does that give a seg fault?
  2. Can you gunzip these two files and re-run make.contigs as before but removing the .gz parts of the filenames

Let me know how that goes…
Pat