classify.otu: sequences is not in your taxonomy file

Hi all,
i am facing a problem when running classify.otu for one sample at time.

remove.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.fasta,accnos=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.uchime.accnos,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.names)

classify.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.fasta,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names,reference=/trainset10_082014.pds.fasta,taxonomy=trainset10_082014.pds.tax, cutoff=80)

dist.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.fasta, cutoff=0.20)
cluster(column=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.dist,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names)
classify.otu(list=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.an.list,taxonomy=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names,label=0.03)

0.03 612
S90_2111.13277.10444 is not in your taxonomy file. I will not include it in the consensus.
S90_1113.17238.21005 is not in your taxonomy file. I will not include it in the consensus.
S90_1109.9591.12186 is not in your taxonomy file. I will not include it in the consensus.
S90_1110.18233.19089 is not in your taxonomy file. I will not include it in the consensus.
S90_1106.19137.11777 is not in your taxonomy file. I will not include it in the consensus.

And so on

what is the cause of this problem knowing i did all the steps on the name file, why these sequences are not in my taxonomy, if they didn’t classify they should be unknown?
how i can solve it?

thank you in advance,
Ashraf

What version of mothur are you using?

Thanks for sending your files. I was able to figure out the source of the issue. When mothur uses multiple processors, it will split the file into chunks to process. It splits the forward fastq file, and then searches the other files for the sequence at the split locations in the other files. Since many of the sequence names we see look like @MS7_15058:1:1101:11899:1633#8/1 and @MS7_15058:1:1101:11899:1633#8/2 mothur will look for the exact name, but also the trimmed name @MS7_15058:1:1101:11899:1633#8/ and @MS7_15058:1:1101:11899:1633#8/. In your case, this caused a match in the wrong spot. I have fixed the error in the code to enable matches for situations like above without causing multiple matches as in your case. The change will be part of our next release. In the meantime running with processors=1 will avoid this error. Sorry for the inconvenience and thanks for helping us find and resolve this bug.

Hi,
I am using mothur v 1.36.1 mac 64bit and I still receive the same problems as above. The processors=1 is not an option for classify.otu.
My commands are:

remove.seqs(accnos=lima2.trim.good.filter.unique.precluster.denovo.uchime.accnos, fasta=lima2.trim.good.filter.unique.precluster.fasta, name=lima2.trim.good.filter.unique.precluster.names, group=lima2.good.groups)
classify.seqs(fasta=lima2.trim.good.filter.unique.precluster.pick.fasta, name=lima2.trim.good.filter.unique.precluster.pick.names, group=lima2.good.pick.groups, template=trainset14_032015.rdp.fasta, taxonomy=trainset14_032015.rdp.tax, cutoff=80, relabund=T)
remove.lineage(fasta=lima2.trim.good.filter.unique.precluster.pick.fasta, name=lima2.trim.good.filter.unique.precluster.pick.names, group=lima2.good.pick.groups, taxonomy=lima2.trim.good.filter.unique.precluster.pick.rdp.wang.taxonomy, taxon=Mitochondria-Chloroplast-Eukaryota-Archaea-unknown)
dist.seqs(fasta=lima2.trim.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.15)
cluster(column=lima2.trim.good.filter.unique.precluster.pick.pick.dist, name=lima2.trim.good.filter.unique.precluster.pick.pick.names)
make.shared(list=lima2.trim.good.filter.unique.precluster.pick.pick.an.list, group=lima2.good.pick.pick.groups, label=0.03)
classify.otu(list=lima2.trim.good.filter.unique.precluster.pick.pick.an.list, taxonomy=lima2.trim.good.filter.unique.precluster.pick.rdp.wang.pick.taxonomy, label=0.03, persample=T, cutoff=80, group=lima2.good.pick.pick.groups, processors=1)

Error
G1KUOEB03FRRZS is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03GN9G7 is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03F1BNV is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03GJSRP is not in your taxonomy file. I will not include it in the consensus.

Any other suggestions to fix this?
Thanks
Tiff

Have you double checked the filenames? You could try using the current option to confirm. Are the sequences indeed missing from the taxonomy file?

http://www.mothur.org/wiki/Frequently_asked_questions#File_Mismatches_-_.22.5BERROR.5D:_yourSequence_is_in_fileA_but_not_in_fileB.2C_please_correct..22

Having the same issue as stated above.

When I try to do the processors=1 it issues an error & after rerunning the steps beforehand with processors=1 to ensure that the default has been altered I still get the warning:

classify.otu(list=stability.trim.contigs.good.unique.good.filter.precluster.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.precluster.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.precluster.pick.rdp.wang.taxonomy, label=0.03 )

MISEQ-LAB244-W7_282_000000000-AEN7G_1_1101_13052_12183 is not in your taxonomy file. I will not include it in the consensus.
etc. etc.

Thanks much for any help!

Thank you westcott,
When I look in both these files (from the original commands):
lima2.trim.good.filter.unique.precluster.pick.rdp.wang.taxonomy
lima2.trim.good.filter.unique.precluster.pick.pick.an.0.03.cons.taxonomy
The filenames G1KUOEB03GN1RT etc aren’t present.


As you suggest when I rerun commands with '=current': >>remove.seqs(accnos=lima2.trim.good.filter.unique.precluster.denovo.uchime.accnos, fasta=lima2.trim.good.filter.unique.precluster.fasta, name=lima2.trim.good.filter.unique.precluster.names, group=lima2.good.groups) >>classify.seqs(fasta=current, name=current, group=current, template=trainset14_032015.rdp.fasta, taxonomy=trainset14_032015.rdp.tax, cutoff=80, relabund=T) >>remove.lineage(fasta=current, name=current, group=current, taxonomy=current, taxon=Mitochondria-Chloroplast-Eukaryota-Archaea-unknown) >>dist.seqs(fasta=current, cutoff=0.15) >>cluster(column=current, name=current) >>make.shared(list=current, group=current, label=0.03) >>classify.otu(list=current, taxonomy=current, name=current, label=0.03, persample=T, cutoff=80, group=current)

I get no errors or problems.
Thank you so much for your help and instructions!
Tiff