mothur

Problem with classify.seqs for fungi in mothur 1.43 linux

Dear mothur team,

I have a data set ITS for fungi and I am using the pipeline provided in https://github.com/krmaas/bioinformatics/blob/master/mothur.fungal.batch
Previous I used mothur 1.42.3 for linux and did not have any problem. Now I am using the lattest version and encountered for 2-3 times the same problem. My scripts are:

Linux version
Using ReadLine,Boost,GSL
mothur v.1.43.0

mothur > make.contigs(file=fungi.txt, processors=8)
Output File Names:
fungi.trim.contigs.fasta
fungi.scrap.contigs.fasta
fungi.contigs.report
fungi.contigs.groups

mothur > summary.seqs(fasta=current)
Using fungi.trim.contigs.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.summary

mothur > screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxhomop=8, minlength=194, maxlength=400)
Using fungi.trim.contigs.fasta as input file for the fasta parameter.
Using fungi.contigs.groups as input file for the group parameter.
Using fungi.trim.contigs.summary as input file for the summary parameter.
Output File Names:
fungi.trim.contigs.good.summary
fungi.trim.contigs.good.fasta
fungi.trim.contigs.bad.accnos
fungi.contigs.good.groups

mothur > summary.seqs(fasta=current)
Using fungi.trim.contigs.good.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.good.summary

mothur > unique.seqs(fasta=current)
Using fungi.trim.contigs.good.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.good.names
fungi.trim.contigs.good.unique.fasta

mothur > summary.seqs(fasta=current, name=current)
Using fungi.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using fungi.trim.contigs.good.names as input file for the name parameter.
Output File Names:
fungi.trim.contigs.good.unique.summary

mothur > count.seqs(name=current, group=current)
Using fungi.contigs.good.groups as input file for the group parameter.
Using fungi.trim.contigs.good.names as input file for the name parameter.
Output File Names:
fungi.trim.contigs.good.count_table

mothur > pre.cluster(fasta=current, count=current, diffs=2)
Using fungi.trim.contigs.good.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.good.unique.precluster.fasta
fungi.trim.contigs.good.unique.precluster.count_table
fungi.trim.contigs.good.unique.precluster.Acaiss1.map
fungi.trim.contigs.good.unique.precluster.Acaiss2.map
fungi.trim.contigs.good.unique.precluster.Acaiss3.map
fungi.trim.contigs.good.unique.precluster.Acaiss4.map
fungi.trim.contigs.good.unique.precluster.Acaiss5.map
fungi.trim.contigs.good.unique.precluster.Afulva1.map
fungi.trim.contigs.good.unique.precluster.Afulva2.map
fungi.trim.contigs.good.unique.precluster.Afulva3.map
fungi.trim.contigs.good.unique.precluster.Afulva4.map
fungi.trim.contigs.good.unique.precluster.Afulva5.map
fungi.trim.contigs.good.unique.precluster.SD1.map
fungi.trim.contigs.good.unique.precluster.SD2.map
fungi.trim.contigs.good.unique.precluster.SD3.map
fungi.trim.contigs.good.unique.precluster.SD4.map
fungi.trim.contigs.good.unique.precluster.SD5.map
fungi.trim.contigs.good.unique.precluster.SW1.map
fungi.trim.contigs.good.unique.precluster.SW2.map
fungi.trim.contigs.good.unique.precluster.SW3.map
fungi.trim.contigs.good.unique.precluster.SW4.map
fungi.trim.contigs.good.unique.precluster.SW5.map
fungi.trim.contigs.good.unique.precluster.Tignis1.map
fungi.trim.contigs.good.unique.precluster.Tignis2.map
fungi.trim.contigs.good.unique.precluster.Tignis3.map
fungi.trim.contigs.good.unique.precluster.Tignis4.map
fungi.trim.contigs.good.unique.precluster.Tignis5.map

mothur > summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.good.unique.precluster.summary

mothur > chimera.uchime(fasta=fungi.trim.contigs.good.unique.precluster.fasta, count=fungi.trim.contigs.good.unique.precluster.count_table, dereplicate=t)
Output File Names:
fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table
fungi.trim.contigs.good.unique.precluster.denovo.uchime.chimeras
fungi.trim.contigs.good.unique.precluster.denovo.uchime.accnos

mothur > remove.seqs(accnos=current, fasta=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.accnos as input file for the accnos parameter.
Using fungi.trim.contigs.good.unique.precluster.fasta as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is fungi.trim.contigs.good.names which seems to match fungi.trim.contigs.good.unique.precluster.fasta.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.fasta

mothur > summary.seqs(fasta=current, count=current)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.fasta as input file for the fasta parameter.
Output File Names:
fungi.trim.contigs.good.unique.precluster.pick.summary

mothur > classify.seqs(fasta=current, count=current, taxonomy=UNITEv8_sh_dynamic_s_all.tax, reference=UNITEv8_sh_dynamic_s_all.fasta, cutoff=60)
Using fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using fungi.trim.contigs.good.unique.precluster.pick.fasta as input file for the fasta parameter.

Generating search database… DONE.
It took 20 seconds generate search database.
Reading in the UNITEv8_sh_dynamic_s_all.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.

However, for some reason, when the pipeline reaches the classify.seqs the script is killed. Even when I reduce the number of processors used. I checked the memory usage and it is not the problem. When I transferred the input files and used the mothur 1.43 for mac, the script runs without any problem (but it takes forever to complete, that is why I would like to use the linux).
I could not find where the problem is. Does anyone could help me, please.
Huge thanks in advance!
Cheers,
Cris

Thanks for reporting this issue and for providing such a detailed explanation. Could you send fungi.trim.contigs.good.unique.precluster.pick.fasta and fungi.trim.contigs.good.unique.precluster.denovo.uchime.pick.count_table to mothur.bugs@gmail.com so I can take a closer look?

Dear Sarah,

Did you receive the files?
Cheers,
Cris

I received the files and I am processing them now without error. I am using the UNITEv8_sh_dynamic.* files from https://unite.ut.ee/repository.php mothur release.

Are you using our executable version or did you compile from source?

How much RAM do you have?

I was able to produce a crash in reading the reference files with less RAM. If the size of the reference is not the issue, then mothur should output something like: “It took xxx seconds get probabilities.” Are you seeing that message?

Are your references (UNITEv8_sh_dynamic_s_all.*) larger than the mothur release UNITEv8_sh_dynamic files?