Taxonomy results of mothur do not match with NCBI blast results

I have a question related the cons.taxonomy table generated via mothur. I followed the MiSeq SOP (MiSeq SOP). After removing the mock groups from the data, the final: fasta file, count table and taxonomy file remained for further analysis.

I clustered the sequences into otus using agc method, then created the otu table (shared file) and finally classified the otus.

mothur > cluster(, count=final.count_table, vsearch=/users/desktop/mothur/vsearch, method=agc)

mothur > make.shared(, count=final.count_table, label=0.03)

mothur > classify.otu(, count=final.count_table, taxonomy=final.taxonomy, label=0.03)

I selected OTU 2 from list file:
Otu00002 M06339_192_000000000-KBLJK_1_2104_20594_11589

I searched for this tag “M06339_192_000000000-KBLJK_1_2104_20594_11589” in my final.fasta file. It was as follows:


I copied the sequence and did standard nucleotide blast as follows:

The taxonomy of OTU 2 is not the same as in the taxonomy file generated using classify.otu command:

Can you please explain me that why are there such mismatches in data?
And am i checking this correctly? If not then what can i do to check weather the sequences in my fasta file with the taxonomy generated by mothur match with blast results as well?


Hello! What database did you use at what cutoff? Latest version of Silva using cutoff of 80 is usually making it right for me. Ran into that kind of problems with older versions of taxonomy references database where, using my mock community control. For example, Pseudomonas was ok with RDp but not with Silva or vice versa I do not remember correctly. Had to lower the cutoff to 75 to get this peculiar genus right. Anyway, how is your positive control looking?

1 Like

BLAST and the RDP call the same sequence different things. Also, you are look at the names that the scientist who deposited the sequence gave the sequence, which is not necessarily correct or as robust as a true taxonomy.

Finally, the algorithms you are using are fundamentally different. The naive bayesian classifier used in classify.seqs is far more robust than blasting a sequence and taking the top match.


1 Like

I used reference files from Silva reference files.
Release 138.1. * Full-length sequences and taxonomy references (128884 bacteria, 2846 archaea, and 14871 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~6.8 GB and the compressed version is 241 MB.

I did not use any cutoff parameter in classify.otu command. The default is 51.

Moreover, I did another thing. I found out the representative sequence of each OTU using the following:

mothur > get.oturep(, count=final.count_table, method=abundance,
You did not provide a label, using 0.03.
0.03 17032

Output File Names:

/users/desktop/mothur/untitled folder/
/users/desktop/mothur/untitled folder/

I did ncbi blast of sequences of otus from file. The blast results were similar to the taxonomy given in my file. Is this method correct?

Almost, use cutoff of 0.8 for the classify.otu. Follow the online SOP and you should be fine until you get familiar with Mothur and start tailoring the pipeline to your liking.

1 Like

Also, if you want a taxonomy for an OTU, I strongly encourage people to use classify.otu rather than get.oturep and classifying the output. You can see how we do this in the SOP


1 Like