segmentation fault (core dump) classify seqs

I’m running the following batch commands on a large set of sequences (after chimera checking- 77k preclustered unique, representing 8.6M seqs). during classify seqs the process failed.

Here are the lines from my error file

[compute-2-0:30895] *** Process received signal ***
[compute-2-0:30895] Signal: Segmentation fault (11)
[compute-2-0:30895] Signal code: Address not mapped (1)
[compute-2-0:30895] Failing at address: 0x1219883e0
[compute-2-0:30895] [ 0] /lib64/libpthread.so.0() [0x368ae0f500]
[compute-2-0:30895] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x7b1) [0x2afdd245a821]
[compute-2-0:30895] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x57) [0x2afdd245bb87]
[compute-2-0:30895] [ 3] /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x368c6bd09d]
[compute-2-0:30895] [ 4] /usr/lib64/libstdc++.so.6(_Znam+0x9) [0x368c6bd1b9]
[compute-2-0:30895] [ 5] mothur(_ZN8Bayesian12readProbFileERSt14basic_ifstreamIcSt11char_traitsIcEES4_SsSs+0xbc9) [0x4ef399]
[compute-2-0:30895] [ 6] mothur(_ZN8BayesianC1ESsSsSsiiiibb+0x1946) [0x4f3b76]
[compute-2-0:30895] [ 7] mothur(_ZN19ClassifySeqsCommand7executeEv+0x4d7) [0x636027]
[compute-2-0:30895] [ 8] mothur(_ZN11BatchEngine8getInputEv+0x925) [0x7b6825]
[compute-2-0:30895] [ 9] mothur(main+0x126c) [0xa87a9c]
[compute-2-0:30895] [10] /lib64/libc.so.6(__libc_start_main+0xfd) [0x368a21ecdd]
[compute-2-0:30895] [11] mothur() [0x4aba49]
[compute-2-0:30895] *** End of error message ***
/opt/gridengine/default/spool/compute-2-0/job_scripts/53711: line 32: 30895 Segmentation fault (core dumped) mothur maps.to.may.batch

###basic mothur processing of MiSeq sequences using the Caporasso primer

#make.contigs(processors=8, ffastq=Undetermined_S0_L001_R1_001.fastq.gz, rfastq=Undetermined_S0_L001_R2_001.fastq.gz, findex=Undetermined_S0_L001_I1_001.fastq.gz, oligos=indu.oligos)

### make.contigs on each run individually, cat *001.trim.contigs.fasta and *contigs.groups

summary.seqs(fasta=maps.to.may.trim.contigs.fasta, processors=8)

screen.seqs(fasta=current, group=maps.to.may.contigs.groups, summary=current, maxambig=0, maxlength=300)

summary.seqs(fasta=current, group=current)

#reduce fasta size by only keeping one of each sequence, this generates a names file
unique.seqs(fasta=current)

summary.seqs(fasta=current, name=current)

#replaces both the names and group file (which contain the name of every sequence) with a count table, not sure if I like this but am going with it for now
count.seqs(name=current, group=current)

summary.seqs(count=current, fasta=current)

#align to a custom silva db (trimmed to v4 using "pcr.seqs")
align.seqs(fasta=current, reference=silva.v4.fasta)

summary.seqs(fasta=current, count=current)

#remove the seqs that just didn't align (using the nubmers from the previous summmary.seqs
screen.seqs(fasta=current, count=current, summary=current, start=8, end=9582, maxhomop=8)

#remove columns from alignment that only contain -
filter.seqs(fasta=current, vertical=T, processors=8)

summary.seqs(fasta=current, count=current)

#pre.cluster to 1% difference to reduce computation time
pre.cluster(fasta=current, diffs=2, count=current)

summary.seqs(fasta=current, count=current)

#removes chimeras only from the samples that they are called chimeras, if you want to remove from all samples change dereplicate=f
chimera.uchime(fasta=current, count=current, dereplicate=t)

remove.seqs(fasta=current, accnos=current, count=current)

summary.seqs(fasta=current, count=current)

#RDP classifier
classify.seqs(fasta=current, count=current, reference=trainset10_082014.pds.fasta, taxonomy=trainset10_082014.pds.tax, cutoff=60)

#remove all non-target sequences
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

#make otus for each Order individually
cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=4)

#make 3, 5 and 10% OTU matrix
make.shared(list=current, count=current, label=0.03-0.05-0.10)

#classify each OTU, used the RDP classification 100% means all seqs in that OTU match at that classification level
classify.otu(list=current, count=current, taxonomy=current)

get.oturep(fasta=current, count=current, list=current)

#check number of sequences in each sample
count.groups(shared=current)

#alpha diversity
summary.single(shared=current, calc=nseqs-sobs-coverage-shannoneven-invsimpson, subsample=10000)

#beta diversity
dist.shared(shared=current, calc=braycurtis-jest-thetayc, subsample=10000)

here’s the end of the logfile

mothur > classify.seqs(fasta=current, count=current, reference=trainset10_082014.pds.fasta, taxonomy=trainset10_082014.pds.tax, cutoff=60)
Using maps.to.may.trim.contigs.good.unique.good.filter.precluster.uchime.pick.pick.count_table as input file for the count parameter.
Using maps.to.may.trim.contigs.good.unique.good.filter.precluster.pick.fasta as input file for the fasta parameter.

Using 8 processors.
Reading template taxonomy… DONE.

What version of mothur are you using? Have you tried just running the classify.seqs command alone? Have you tried reducing the number of processors?

Hi Sarah
The one that failed was 1.34. I’d meant to run it on 1.35.1 but didn’t update my path correctly. Once updated, it ran using the same number of processors. So it’s a hummm? not sure what changed between 34 and 35 that allowed it to finish.

Did you run the whole batch or just classify.seqs?

I started the batch again at classify.seqs

I am wondering if perhaps it’s a memory leak somewhere in the batch, that is becoming visible due the datasets size. We didn’t change the source in classify.seqs in the last release. The restart would clear out the memory.

the server isn’t being used much right now. Want me to try rerunning it from the top with some sort of memory tracker (if so what?)

I’m interested in tracking this down because I’m going to continue adding to this dataset for at least the next year

Hi,

I am also getting the same error for classify.seqs(ver 1.35.1)
I have tried the processors limit from 100, 10 and 1. Still getting the error.

[cxx:08864] *** Process received signal ***
[cxx:08864] Signal: Segmentation fault (11)
[cxx:08864] Signal code: Address not mapped (1)
[cxx:08864] Failing at address: 0x2c837568
[cxx:08864] [ 0] /lib64/libpthread.so.0 [0x3a87a0e7c0]
[cxx:08864] [ 1] /lib64/libc.so.6 [0x3a86e732c5]
[cxx:08864] [ 2] /lib64/libc.so.6(__libc_malloc+0x6e) [0x3a86e74bee]
[cxx:08864] [ 3] /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x3a896bd17d]
[cxx:08864] [ 4] /usr/lib64/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x21) [0x3a8969b801]
[cxx:08864] [ 5] /usr/lib64/libstdc++.so.6 [0x3a8969c555]
[cxx:08864] [ 6] /usr/lib64/libstdc++.so.6(_ZNSsC1ERKSsmm+0x38) [0x3a8969c688]
[cxx:08864] [ 7] mothur(_ZN8Bayesian12readProbFileERSt14basic_ifstreamIcSt11char_traitsIcEES4_SsSs+0x866) [0x568706]
[cxx:08864] [ 8] mothur(_ZN8BayesianC1ESsSsSsiiiibb+0x2a7e) [0x56b75e]
[cxx:08864] [ 9] mothur(_ZN19ClassifySeqsCommand7executeEv+0x130) [0x6f55b0]
[cxx:08864] [10] mothur(_ZN11BatchEngine8getInputEv+0x8e7) [0x12e8a47]
[cxx:08864] [11] mothur(main+0xe59) [0x1333609]
[cxx:08864] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a86e1d994]
[cxx:08864] [13] mothur(__gxx_personality_v0+0x491) [0x49ad99]
[cxx:08864] *** End of error message ***
/opt/gridengine/default/spool/cxx/job_scripts/5438447: line 30:  8864 Segmentation fault      (core dumped) mothur classify.batch

I am analysing the saliva samples from 1019 individuals and currently the data for classify.seqs are

of unique seqs: 118868

total # of seqs: 48728217

Does anyone know what is wrong here? is it with the server memory issue?

thanks,

Would you mind installing the newest version of mothur (1.36.1) and see if the problem persists?

Hi,

I tried to install the latest version but it gave error “missing librarires”
iostreams.a and zlib.a.

And our IT department told NO NEW libraries will be added to the mars at this time. As they are moving to new cluster and asked me to wait till they set-up new cluster (probably by Jan 2016).

I am trying this classify.seqs command with old version 1.34.3 seems its working, no error so far.
progressing…

Hope its OK to run the old version for just classify.seq step.

:slight_smile:

thanks,

I never figured out what was wrong, but I haven’t gotten that error in 1.35 or 1.36

Hi Sajancr,
It looks like a corruption of the shortcut files. Mothur creates shortcut files for the classify.seqs command. Among these is a *.xmer.prob file. From your output it looks like mothur is failing while reading this file.

[cxx:08864] [ 7] mothur(_ZN8Bayesian12readProbFileERSt14basic_ifstreamIcSt11char_traitsIcEES4_SsSs+0x866) [0x568706]

Mothur will recreate this file with each new version of mothur released. If several instances of mothur are running at the same time and using the same reference files, they will overwrite this file as other instances are reading the file causing crashes or unexpected behavior. To avoid this you can run the command once when you update your version of mothur to create the files. Once the files exist with the correct version tag, mothur will just read them.

Kindly,
Sarah Westcott