Taxonomy results of mothur do not match with NCBI blast results

I have a question related the cons.taxonomy table generated via mothur. I followed the MiSeq SOP (MiSeq SOP). After removing the mock groups from the data, the final: fasta file, count table and taxonomy file remained for further analysis.

I clustered the sequences into otus using agc method, then created the otu table (shared file) and finally classified the otus.

mothur > cluster(, count=final.count_table, vsearch=/users/desktop/mothur/vsearch, method=agc)

mothur > make.shared(, count=final.count_table, label=0.03)

mothur > classify.otu(, count=final.count_table, taxonomy=final.taxonomy, label=0.03)

I selected OTU 2 from list file:
Otu00002 M06339_192_000000000-KBLJK_1_2104_20594_11589

I searched for this tag “M06339_192_000000000-KBLJK_1_2104_20594_11589” in my final.fasta file. It was as follows:


I copied the sequence and did standard nucleotide blast as follows:

The taxonomy of OTU 2 is not the same as in the taxonomy file generated using classify.otu command:

Can you please explain me that why are there such mismatches in data?
And am i checking this correctly? If not then what can i do to check weather the sequences in my fasta file with the taxonomy generated by mothur match with blast results as well?


Hello! What database did you use at what cutoff? Latest version of Silva using cutoff of 80 is usually making it right for me. Ran into that kind of problems with older versions of taxonomy references database where, using my mock community control. For example, Pseudomonas was ok with RDp but not with Silva or vice versa I do not remember correctly. Had to lower the cutoff to 75 to get this peculiar genus right. Anyway, how is your positive control looking?

BLAST and the RDP call the same sequence different things. Also, you are look at the names that the scientist who deposited the sequence gave the sequence, which is not necessarily correct or as robust as a true taxonomy.

Finally, the algorithms you are using are fundamentally different. The naive bayesian classifier used in classify.seqs is far more robust than blasting a sequence and taking the top match.