mothur

Processing fails with 1.43.0

I haven’t been able to complete a project processing with 1.43.0. I’m using the same bash as I was using with 1.41.x. Two projects have failed at remove.seqs after chimera checking with the following error (different seq name but same error) [ERROR]: M01676_138_000000000-CB4PJ_1_1101_10080_19438 is not in your count table. Please co rrect. The other fails at pre.cluster with no error, just quits. I’ll send you all of the logfiles.

Could you send your input files for the pre.cluster as well?

I noticed that the count file contained 3 empty groups which was causing a problem. I removed them using the make.table command, and the pre.cluster command was able to finish without issue.

mothur > make.table(count=zymoTest.trim.contigs.good.good.count_table, compress=t) - removed 3 empty samples

mothur > pre.cluster(fasta=zymoTest.trim.contigs.good.unique.good.filter.fasta, count=current, diffs=2)

thanks Sarah, I’ve updated my bash after chimera checking.

would it be reasonable to add the make.table command before pre.cluster to my usual processing script and just run it every time?

I added a check for missing groups in pre.cluster, so that it will provide a error message instead of crashing. I’d like to correct the issue if it was caused within mothur. Did you modify the table outside of mothur or did mothur include empty groups?

Hi Sarah, I didn’t do anything outside of mothur. I’m running this as a batch.

`#!bash
PROJECTNAME=$1




mothur "#make.contigs(file=$PROJECTNAME.file, processors=32); 
summary.seqs(fasta=current); 
screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxlength=275); 
summary.seqs(fasta=current); 
unique.seqs(fasta=current); 
summary.seqs(fasta=current, name=current); 
count.seqs(name=current, group=current); 
align.seqs(fasta=current, reference=silva.nr_v119.v4.align); 
summary.seqs(fasta=current, count=current); 
screen.seqs(fasta=current, count=current, summary=current, start=1968, end=11550, maxhomop=8); 
filter.seqs(fasta=current, vertical=T); 
summary.seqs(fasta=current, count=current);
make.table(count=current, compress=t);
pre.cluster(fasta=current, diffs=2, count=current); 
summary.seqs(fasta=current, count=current); 
chimera.vsearch(fasta=current, count=current, dereplicate=t); 
remove.seqs(fasta=current, accnos=current, count=current); 
summary.seqs(fasta=current, count=current); 
classify.seqs(fasta=current, count=current, reference=silva.nr_v119.v4.align, taxonomy=silva.nr_v119.tax, cutoff=80); 
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota); 
summary.tax(taxonomy=current, count=current); 
dist.seqs(fasta=current, countends=F, cutoff= 0.03, processors=16); 
cluster(column=current, count=current, method=opti); 
summary.seqs(processors=32); 
make.shared(list=current, count=current); 
classify.otu(list=current, count=current, taxonomy=current); 
get.oturep(fasta=current, count=current, list=current, method=abundance); 
count.groups(shared=current); 
summary.single(shared=current, calc=nseqs-sobs-coverage-shannon-shannoneven-invsimpson, subsample=10000); 
dist.shared(shared=current, calc=braycurtis-jest-thetayc, subsample=10000); 
sub.sample(count=$PROJECTNAME.trim.contigs.good.unique.good.filter.precluster.denovo.uchime.pick.pick.pick.count_table, shared=current, list=$PROJECTNAME.trim.contigs.good.unique.good.filter.precluster.pick.pick.opti_mcc.list, size=10000, persample=true, label=0.03); 

sub.sample(taxonomy=$PROJECTNAME.trim.contigs.good.unique.good.filter.precluster.pick.nr_v119.wang.pick.taxonomy, count=$PROJECTNAME.trim.contigs.good.unique.good.filter.precluster.denovo.uchime.pick.pick.pick.count_table, list=$PROJECTNAME.trim.contigs.good.unique.good.filter.precluster.pick.pick.opti_mcc.list, size=10000, persample=true, label=0.03); 
summary.tax(taxonomy=current, count=current); system(mkdir send); 
system(cp *shared send); system(cp *cons.tax* send); system(cp *pick.tax.summary send); system(cp *pick.subsample.tax.summary send); system(cp *.rep.fasta send); system(cp *lt.ave.dist send); system(cp *groups.ave-std.summary send); system(cp mothur.bash send); system(cp mothur.*.logfile send);"

`

The issue is the combination of chimera.vsearch with dereplicate=t and remove.seqs with the count table included. If the dereplicate parameter is false, then if one group finds the sequence to be chimeric, then all groups find it to be chimeric. If you set dereplicate=t, if a group finds a sequence to be chimeric it is only removed from that group. When dereplicate=t mothur creates a modified count file with the chimeric reads removed for you. You do not want to include the modified count file with the remove.seqs command. Instead try this:

mothur > chimera.vsearch(fasta=current, count=current, dereplicate=t) - remove chimeras from count table and create accnos file for removing them from other files

mothur > remove.seqs(fasta=current, accnos=current) - remove chimeras from the fasta file

thanks, this worked. However minor point, the resulting fasta doesn’t include “denovo.vsearch” in the name like it did in past versions. Can that be added back in?

1.43 output

mothur > 
remove.seqs(fasta=current, accnos=current)
Using zymoTest.trim.contigs.good.unique.good.filter.precluster.denovo.vsearch.accnos as input file for the accnos parameter.
Using zymoTest.trim.contigs.good.unique.good.filter.precluster.fasta as input file for the fasta parameter.
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is zymoTest.trim.contigs.good.names which seems to match zymoTest.trim.contigs.good.unique.good.filter.precluster.fasta.
Removed 31904 sequences from your fasta file.

Output File Names: 
zymoTest.trim.contigs.good.unique.good.filter.precluster.pick.fasta