clasify.seqs segmentation fault

Hi,

I’m doing some analyses on a fairly large dataset (about 20 Gbytes of raw sequences obtained using the Kozich protocol). I did a first run of the mothur SOP and had no problem. However, when I repeated the analysis (again, following the SOP) it would’t finish the classification of the sequences right after the removal of the chimeras.

This is the command as appears on the generated logfile

mothur > classify.seqs(fasta=/home/jdelacuesta/microbio/output/microbio.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=/home/jdelacuesta/microbio/output/microbio.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, reference=/home/jdelacuesta/microbio/db/trainset14_032015.pds.fasta, taxonomy=/home/jdelacuesta/microbio/db/trainset14_032015.pds.tax, cutoff=80, processors=1)

Using 1 processors.
Reading template taxonomy...     DONE.
Reading template probabilities...     DONE.
It took 121 seconds get probabilities.
Classifying sequences from /home/jdelacuesta/microbio/output/microbio.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta ...

and this is the stderr

/var/spool/torque/mom_priv/jobs/15787.apolo.eafit.edu.co.SC: line 1: !/bin/bash: No such file or directory
TERM environment variable not set.
/var/spool/torque/mom_priv/jobs/15787.apolo.eafit.edu.co.SC: line 13: 10491 Segmentation fault      mothur taxonomia.batch

after some cursing I decided to reupload the RDP reference files thinking they could have been somehow corrupted when I noticed several new files, byproducts of the first time I executed the analysis

trainset14_032015.pds.8mer
trainset14_032015.pds.trainset14_032015.pds.8mer.numNonZero
trainset14_032015.pds.trainset14_032015.pds.8mer.prob
trainset14_032015.pds.tree.sum
trainset14_032015.pds.tree.train

Once those files were removed, everything came back to normal and I was able to finish the run.

What are those files, why do they mess mothur (or my computer) and would it be possible in future releases that they automatically removed when mothur quits?

Those files are used to speed up the classifications and if the program stops in the midst of creating them then you’ll have problems. As you’ve found, if you delete the files and start again, you’ll be good to go.

pat