classify.otu: sequences is not in your taxonomy file

aashhab · August 9, 2015, 8:30am

Hi all,
i am facing a problem when running classify.otu for one sample at time.

remove.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.fasta,accnos=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.uchime.accnos,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.names)

classify.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.fasta,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names,reference=/trainset10_082014.pds.fasta,taxonomy=trainset10_082014.pds.tax, cutoff=80)

dist.seqs(fasta=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.fasta, cutoff=0.20)
cluster(column=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.dist,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names)
classify.otu(list=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.an.list,taxonomy=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy,name=Mapped_NP_S90.trim.unique.good.filter.unique.precluster.pick.names,label=0.03)

0.03 612
S90_2111.13277.10444 is not in your taxonomy file. I will not include it in the consensus.
S90_1113.17238.21005 is not in your taxonomy file. I will not include it in the consensus.
S90_1109.9591.12186 is not in your taxonomy file. I will not include it in the consensus.
S90_1110.18233.19089 is not in your taxonomy file. I will not include it in the consensus.
S90_1106.19137.11777 is not in your taxonomy file. I will not include it in the consensus.

And so on

what is the cause of this problem knowing i did all the steps on the name file, why these sequences are not in my taxonomy, if they didn’t classify they should be unknown?
how i can solve it?

thank you in advance,
Ashraf

westcott · August 11, 2015, 4:31pm

What version of mothur are you using?

westcott · August 17, 2015, 12:53pm

Thanks for sending your files. I was able to figure out the source of the issue. When mothur uses multiple processors, it will split the file into chunks to process. It splits the forward fastq file, and then searches the other files for the sequence at the split locations in the other files. Since many of the sequence names we see look like @MS7_15058:1:1101:11899:1633#8/1 and @MS7_15058:1:1101:11899:1633#8/2 mothur will look for the exact name, but also the trimmed name @MS7_15058:1:1101:11899:1633#8/ and @MS7_15058:1:1101:11899:1633#8/. In your case, this caused a match in the wrong spot. I have fixed the error in the code to enable matches for situations like above without causing multiple matches as in your case. The change will be part of our next release. In the meantime running with processors=1 will avoid this error. Sorry for the inconvenience and thanks for helping us find and resolve this bug.

nelson5 · December 1, 2015, 3:38pm

Hi,
I am using mothur v 1.36.1 mac 64bit and I still receive the same problems as above. The processors=1 is not an option for classify.otu.
My commands are:

remove.seqs(accnos=lima2.trim.good.filter.unique.precluster.denovo.uchime.accnos, fasta=lima2.trim.good.filter.unique.precluster.fasta, name=lima2.trim.good.filter.unique.precluster.names, group=lima2.good.groups)
classify.seqs(fasta=lima2.trim.good.filter.unique.precluster.pick.fasta, name=lima2.trim.good.filter.unique.precluster.pick.names, group=lima2.good.pick.groups, template=trainset14_032015.rdp.fasta, taxonomy=trainset14_032015.rdp.tax, cutoff=80, relabund=T)
remove.lineage(fasta=lima2.trim.good.filter.unique.precluster.pick.fasta, name=lima2.trim.good.filter.unique.precluster.pick.names, group=lima2.good.pick.groups, taxonomy=lima2.trim.good.filter.unique.precluster.pick.rdp.wang.taxonomy, taxon=Mitochondria-Chloroplast-Eukaryota-Archaea-unknown)
dist.seqs(fasta=lima2.trim.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.15)
cluster(column=lima2.trim.good.filter.unique.precluster.pick.pick.dist, name=lima2.trim.good.filter.unique.precluster.pick.pick.names)
make.shared(list=lima2.trim.good.filter.unique.precluster.pick.pick.an.list, group=lima2.good.pick.pick.groups, label=0.03)
classify.otu(list=lima2.trim.good.filter.unique.precluster.pick.pick.an.list, taxonomy=lima2.trim.good.filter.unique.precluster.pick.rdp.wang.pick.taxonomy, label=0.03, persample=T, cutoff=80, group=lima2.good.pick.pick.groups, processors=1)

Error
G1KUOEB03FRRZS is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03GN9G7 is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03F1BNV is not in your taxonomy file. I will not include it in the consensus.
G1KUOEB03GJSRP is not in your taxonomy file. I will not include it in the consensus.

Any other suggestions to fix this?
Thanks
Tiff

westcott · December 1, 2015, 8:26pm

Have you double checked the filenames? You could try using the current option to confirm. Are the sequences indeed missing from the taxonomy file?

http://www.mothur.org/wiki/Frequently_asked_questions#File_Mismatches_-_.22.5BERROR.5D:_yourSequence_is_in_fileA_but_not_in_fileB.2C_please_correct..22

renee13 · December 3, 2015, 3:25am

Having the same issue as stated above.

When I try to do the processors=1 it issues an error & after rerunning the steps beforehand with processors=1 to ensure that the default has been altered I still get the warning:

classify.otu(list=stability.trim.contigs.good.unique.good.filter.precluster.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.precluster.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.precluster.pick.rdp.wang.taxonomy, label=0.03 )

MISEQ-LAB244-W7_282_000000000-AEN7G_1_1101_13052_12183 is not in your taxonomy file. I will not include it in the consensus.
etc. etc.

Thanks much for any help!

nelson5 · December 7, 2015, 5:42pm

Thank you westcott,
When I look in both these files (from the original commands):
lima2.trim.good.filter.unique.precluster.pick.rdp.wang.taxonomy
lima2.trim.good.filter.unique.precluster.pick.pick.an.0.03.cons.taxonomy
The filenames G1KUOEB03GN1RT etc aren’t present.

As you suggest when I rerun commands with '=current': >>remove.seqs(accnos=lima2.trim.good.filter.unique.precluster.denovo.uchime.accnos, fasta=lima2.trim.good.filter.unique.precluster.fasta, name=lima2.trim.good.filter.unique.precluster.names, group=lima2.good.groups) >>classify.seqs(fasta=current, name=current, group=current, template=trainset14_032015.rdp.fasta, taxonomy=trainset14_032015.rdp.tax, cutoff=80, relabund=T) >>remove.lineage(fasta=current, name=current, group=current, taxonomy=current, taxon=Mitochondria-Chloroplast-Eukaryota-Archaea-unknown) >>dist.seqs(fasta=current, cutoff=0.15) >>cluster(column=current, name=current) >>make.shared(list=current, group=current, label=0.03) >>classify.otu(list=current, taxonomy=current, name=current, label=0.03, persample=T, cutoff=80, group=current)

I get no errors or problems.
Thank you so much for your help and instructions!
Tiff

Topic		Replies	Views
taxonomy files problems - urgent Commands in mothur	1	699	February 24, 2017
classify.otu error mothur bugs	1	1208	May 22, 2017
sub.sample and taxonomy file problems mothur bugs	2	3623	January 13, 2012
Error in classify.otu when using subsetted data Commands in mothur	6	1280	May 13, 2017
Classify.otu results in many sequences not in the taxonomy file Commands in mothur	3	1073	September 20, 2018

classify.otu: sequences is not in your taxonomy file

Related topics