batch file execution

Hi,

I was trying to process about 400 samples using batch command. At some point mothur crashes. The logfile at this point is large (~5 GB) and filled with millions of warnings (see below). Some temp files are also present when the program crashes out (sample output also given below). I run my script on a linux cluster using the script:

#$ -l mem_free=32G
qsub -q long.q mymothur.sh

The script ran ok with a smaller subset of samples, but keeps on crashing with the larger set. What am I doing wrong?

thanks!


  1. Batch command file:
    ======================
    set.dir(input=/home/pca/mac/microbiome/fecal/raw_data/all_fastq,output=/home/pca/mac/microbiome/fecal/processed_data/r2)
    make.contigs(file=r2.stability.files,processors=8)

summary.seqs(fasta=current)

screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275)
unique.seqs()
count.seqs(name=current, group=current)
summary.seqs(count=current)
align.seqs(fasta=current, reference=/home/pca/mac/microbiome/fecal/silva.bacteria/silva.v4.fasta)
summary.seqs(fasta=current)
screen.seqs(fasta=current, count=current, start=1968, end=11550, maxhomop=8)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)
pre.cluster(fasta=current, count=current, diffs=2)
chimera.vsearch(fasta=current, count=current, dereplicate=t)
remove.seqs(fasta=current, accnos=current)
classify.seqs(fasta=current, count=current, reference=/home/pca/mac/microbiome/fecal/RDP_ref/trainset14_032015.pds/trainset14_032015.pds.fasta, taxonomy=/home/pca/mac/microbiome/fecal/RDP_ref/trainset14_032015.pds/trainset14_032015.pds.tax, cutoff=80)

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archea-Eukaryota)
summary.tax(taxonomy=current, count=current)

cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15)
make.shared(list=current, count=current, label=0.03)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)

QIIME biom format

make.biom(shared=current, constaxonomy=current)

system(echo “Phylotypes”)

phylotype(taxonomy=current)
make.shared(list=current, count=current, label=1)
classify.otu(list=current, count=current, taxonomy=current, label=1)

system(echo “Phylogenetic”)

dist.seqs(fasta=current,output=lt,processors=1)
clearcut(phylip=current)


  1. last few lines of logfile:
    ======================

[WARNING]: group SWB-0197 contains illegal characters in the name. Group names should not include :, -, or / characters. The ‘:’ character is a special character used in trees. Using ‘:’ will result in your tree being unreadable by tree reading software. The ‘-’ character is a special character used by mothur to parse group names. Using the ‘-’ character will prevent you from selecting groups. The ‘/’ character will created unreadable filenames when mothur includes the group in an output filename.


[WARNING]: group SWB-0197 contains illegal characters in the name. Group names should not include :, -, or / characters. The ':' character is a special character used in trees. Using ':' will result in your tree being unreadable by tree reading software. The '-' character is a special character used by mothur to parse group names. Using the '-' character will prevent you from selecting groups. The '/' character will created unreadable filenames when mothur includes the group in an output filename.
[WARNING]: group SWB-0197 contains illegal characters in the name. Group names should not include :, -, or / characters. The ':' character is a special character used in trees. Using ':' will result in your tree being unreadable by tree reading software. The '-' character is a special character used by mothur to parse group names. Using the '-' character will prevent you from selecting groups. The '/' character will created unreadable filenames when mothur includes the group in an output filename.


  1. First two lines of temp file:
    ======================
    M03576_154_000000000-B4JJT_1_2114_26883_16530 103 1 2 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 0 7 2 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 1 0 2 2 0 0 0 3 0 3 1 0 2 0 2 1 1 3 0 0 0 1 0 0 0 0 1 0 2 1 0 0 1 2 3 2 3 1 0 0 0 1 0 0 0 2 1 2 1 2 0 0 1 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 2 0 0 0 1 0 2 0 0 0 0 2 0 0 0 0 0 2 0 0 1 5 0 3 1 0 1 0 0 0
    M03576_154_000000000-B4JJT_1_2114_25749_16965 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

So those warning are pretty self explanatory; your group names contain symbols that can cause downstream conflicts so it’s best to rename them i.e. change ‘-’ to ‘_’. This is because ‘-’ is used as a delimiter to separate group names in some functions.

What kind of data set are you working with (V4, V4-V5)?

Cheers
Richard

Which version of mothur are you using? If you’ve upgraded to 1.39, I think these are going to conflict with each other

cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15)
make.shared(list=current, count=current, label=0.03)

Now cutoff=0.15 means that mothur will only calculate 0.15 OTUs (I think, Sarah or Pat may point out my error)

The name warnings are important if you are planning on looking at trees because those characters are going to mess up the tree viewing program

I am running mothur v.1.39.4 and working on V4 data.

  1. I have corrected the filenames issue (’_’ instead of ‘-’) and batch file to:

cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.03)
make.shared(list=current, count=current, label=0.03)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)


2. The program still stalls. The last lines from the logfile are:

mothur > unique.seqs()
Using /home/pca/mac/microbiome/fecal/processed_data/r2/r2.stability.trim.contigs.good.fasta as input file for the fasta parameter.
8991152 213462

Output File Names:
/home/pca/mac/microbiome/fecal/processed_data/r2/r2.stability.trim.contigs.good.names
/home/pca/mac/microbiome/fecal/processed_data/r2/r2.stability.trim.contigs.good.unique.fasta


mothur > count.seqs(name=current, group=current) Using /home/pca/mac/microbiome/fecal/processed_data/r2/r2.stability.contigs.good.groups as input file for the group parameter. Using /home/pca/mac/microbiome/fecal/processed_data/r2/r2.stability.trim.contigs.good.names as input file for the name parameter.
I"m not sure if it's running out of memory, even thogh I allocate 32 GB. The temp files are still in the directory, with no error message in the log file..

How long is it “stalling” for? Does the node crash or does it just sit there?

It crashes. Nothing in the mothur logfile, or the cluster error logs to indicate any error or exception.

Could you send your log file and input files to mothur.bugs@gmail.com?