mothur

Classify.otu results in many sequences not in the taxonomy file


#1

For many of my sequences in my 18S analysis, when i run the classify.otu command I receive a lot of messages like

[WARNING]: M02149_449_000000000-AURT8_1_1101_18396_3602 is not in your taxonomy file. I will not include it in the consensus.

I mean a LOT.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Detected 21788 [WARNING] messages, please review.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<^>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

As you can see in my command file, I removed the Bacteria, Archaea, and unknowns before the cluster.spilt, so I don’t know why there are sequences in the shared file that aren’t in the taxonomy file. Any ideas?

make.contigs(ffastq=Run3_18S_R1.fastq, rfastq=Run3_18S_R2.fastq, rindex=Run3_18S_I1.fastq, oligos=Run3_18S_oligos.txt, processors=2)
summary.seqs(fasta=Run3_18S_R1.trim.contigs.fasta)
get.groups(fasta=Run3_18S_R1.trim.contigs.fasta, group=Run3_18S_R1.contigs.groups, groups=WED01-WED02-WED03-WED04-WED05-WED06-WED07-WED08-WED09-WED10-WED11-WED12-WED13-WED14-WED15-WED16-WED17-WED18-WED19-WED20-WED21-WED22-WED23-WED24-WED25-WED26-WED27-WED28-WED29-WED30-WED31-WED32)
screen.seqs(fasta=Run3_18S_R1.trim.contigs.pick.fasta, group=Run3_18S_R1.contigs.pick.groups, maxambig=0, maxlength=260, processors=2)
unique.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.fasta)
count.seqs(name=Run3_18S_R1.trim.contigs.pick.good.names, group=Run3_18S_R1.contigs.pick.good.groups)
summary.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.fasta, count=Run3_18S_R1.trim.contigs.pick.good.count_table)
align.seqs(processors=2, fasta=Run3_18S_R1.trim.contigs.pick.good.unique.fasta, reference=silva.nr_v132.euk.align)
summary.seqs(processors=2, fasta=Run3_18S_R1.trim.contigs.pick.good.unique.align, count=Run3_18S_R1.trim.contigs.pick.good.count_table)
screen.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.align, count=Run3_18S_R1.trim.contigs.pick.good.count_table, summary=Run3_18S_R1.trim.contigs.pick.good.unique.summary, start=15794, end=16431, maxhomop=6)
summary.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.align, count=Run3_18S_R1.trim.contigs.pick.good.good.count_table)
filter.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta= Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.fasta, count=Run3_18S_R1.trim.contigs.pick.good.good.count_table)
pre.cluster(processors=1, fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.fasta, count=Run3_18S_R1.trim.contigs.pick.good.good.count_table, diffs=2)
chimera.vsearch(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.fasta, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
remove.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.fasta, accnos=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos)
classify.seqs(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.fasta, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v132.euk.align, taxonomy=silva.nr_v132.euk.tax, cutoff=80, processors=2)
remove.lineage(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.fasta, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.euk.wang.taxonomy, taxon=Bacteria-unknown-Archaea)
cluster.split(fasta=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, taxonomy=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.euk.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03, processors=2, runsensspec=f)
make.shared(list=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.unique_list.list, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, label=0.03)
classify.otu(list=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.unique_list.list, count=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table, taxonomy=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.euk.wang.pick.taxonomy, label=0.03)
count.groups(shared=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.unique_list.shared)
rarefaction.single(shared=Run3_18S_R1.trim.contigs.pick.good.unique.good.filter.unique.precluster.pick.pick.opti_mcc.unique_list.shared)


#2

Have you checked this out? https://mothur.org/wiki/Frequently_asked_questions#File_Mismatches_-_.22.5BERROR.5D:_yourSequence_is_in_fileA_but_not_in_fileB.2C_please_correct..22

What version of mothur are you using?

Have you tried using the ‘current’ option to avoid typos in the long filenames?


#3

Thanks for the response. I hadn’t seen that post. I might switch to current to see if that fixes it, but I’ve been trying to have the filenames for reference so I can troubleshoot. But if it’s causing me problems then it’s working against me…

I’m using mothur v.1.39.5.

What is strange is that I use essentially the same commands file for 16S and 18S. I just find/replace the names, and change the variables as necessary (screen.seqs, etc.). The taxonomy file IS different by the time we assign taxonomy to OTUs, because we’ve selected ONLY the Eukaryotes from the Silva database, so maybe the issue is with that file. I’m running it now with “current” in the classify.otu command to see if the problem goes away.


#4

Update

I’ve changed the classify.otu command to have “current” for all the files, but the result was exactly the same. Wonder if it’s because I’m using 2 of 2 processors for many of the commands. I’ll change it to 1 for everything and let it run overnight and see if that was the problem.

UPDATE

Change to 1 processor did not change the outcome.

FINAL UPDATE

I changed all the inputs to “current” wherever possible, and this eliminated the problems. It seems there was something that a command was looking for that it wasn’t getting, so maybe I was just pointing it to the wrong file.