Issue with classify.otu

Hi again,

We are studying the bladder microbiome in humans to give some context. Specifically we target the V4 region of the 16S gene. When I run the following command:

classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, label=0.03)

I get the following message:

Your file does not include the label 0.03. I will use 0.01.
0.01 102834

Can someone please explain this to me, does this mean there are not any sequences that are at least 3% different? Is this an error of some kind? Any insight would be appreciated. Thank you again.

without seeing your previous commands, it’s not possible to answer that question

I am using the same commands listed in the MiSeq SOP. I have copied the contents of my bash script below:

make.contigs(file=stability.files, processors=8)
#this will read each fastq file in the stability.files and make the contigs. This can take some time, depending on how many files are in the stability file.

summary.seqs(fasta=stability.trim.contigs.fasta)

summary.seqs(fasta=stability.trim.contigs.good.fasta)

screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, minlength=275, maxlength=300, processors=8)

unique.seqs(fasta=stability.trim.contigs.good.fasta)

count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)

summary.seqs(count=stability.trim.contigs.good.count_table)

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)

system(mv silva.bacteria.pcr.fasta silva.v4.fasta)

summary.seqs(fasta=silva.v4.fasta)

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)

summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=1968, end=11550, maxhomop=8)

summary.seqs(fasta=current, count=current)

filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)

unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)

pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)

chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
#/usr/local/bin/uchime file does not exist. Checking path…
#[ERROR]: file does not exist. mothur requires the uchime executable.
#[ERROR]: did not complete chimera.uchime.

remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)

summary.seqs(fasta=current, count=current)

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)

remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

#the original commands have files with pick.pick.pick…I think the third pick is from the Error rate assessment and therefore took it out.

cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03, processors=12)

I wonder if there’s something funky with cluster.split and mcc?? Try dist.seqs and cluster instead (see Classify OTU Labels for an example)

Something else to note is that I have used this same exact script on other datasets and experience no issue. The resulting files are all at the 0.03 cutoff.

Could this be happening because I am using an older version of Mothur? Are there any associated implications of using this data as I will not have time to run it through again before my deadline?

It sounds like you are using an old version of mothur, then you are going to be getting the average neighbor algorithm by default. I see that you set 0.03 as the cutoff. See this for a brief explanation for why…

https://mothur.org/wiki/Frequently_asked_questions#Why_does_the_cutoff_change_when_I_cluster_with_average_neighbor.3F

I’d encourage you to rerun the dist.seqs and cluter commands with the newest version of mothur using cutoff=0.03 to use the opticlust algorithm by default. This will run much faster than the previous methods and give you better results.

Pat