Taxonomy results of mothur do not match with NCBI blast results

Hira · September 23, 2022, 10:45am

I have a question related the cons.taxonomy table generated via mothur. I followed the MiSeq SOP (MiSeq SOP). After removing the mock groups from the data, the final: fasta file, count table and taxonomy file remained for further analysis.

I clustered the sequences into otus using agc method, then created the otu table (shared file) and finally classified the otus.

mothur > cluster(fasta=final.ng.fasta, count=final.count_table, vsearch=/users/desktop/mothur/vsearch, method=agc)

mothur > make.shared(list=final.ng.agc.list, count=final.count_table, label=0.03)

mothur > classify.otu(list=final.ng.agc.list, count=final.count_table, taxonomy=final.taxonomy, label=0.03)

I selected OTU 2 from list file:
Otu00002 M06339_192_000000000-KBLJK_1_2104_20594_11589

I searched for this tag “M06339_192_000000000-KBLJK_1_2104_20594_11589” in my final.fasta file. It was as follows:

M06339_192_000000000-KBLJK_1_2104_20594_11589
TAC–GT-AG-GGT----GCG-A-G–C–G–T-T–AA-T-CGG-AA–TT-A-C-T–GG-GC–GT-A–AA-GC-GC-GC----G-TA-G-G-T-G-------G–T-TT-G-G-T-------AA----G-A-T-G--------G-A-T–G–TG–A-AA-TC–C-C-CG-G-G--------------CT-C-AA------------C-C-T-G-G-G-A–A-C----T-G–C-A–T–C—C–AT-A-A—C------T–G-C–CT–G-A-C------------------------------------------------------------------------------T-A-G-A-G-T–A----C-GG----TA-G-A----G-G-G-T—GG-T---------GG–A–ATT----T-C-C-T-GT–GT-A-G-CG-GT–G-A-A-A----TG-C-GT-AG–AT-A-TA-----------G-G-----A-A–G-G-A-AC-A-CC-------------AG–T–G–GC-GAA-G–G-C—G—A–C-C-A-C—CTG–G–GC-T-C------------------------A-T------A-C-T–GA–CA–C–T-G–A-GG–T-G-CG-A–AA-G-C-----G-TG–GG-G–AG-C-A-AA–CAGG

I copied the sequence and did standard nucleotide blast as follows:

The taxonomy of OTU 2 is not the same as in the taxonomy file generated using classify.otu command:

Can you please explain me that why are there such mismatches in data?
And am i checking this correctly? If not then what can i do to check weather the sequences in my fasta file with the taxonomy generated by mothur match with blast results as well?

Regards
Hira

Alexandre_Thibodeau · September 26, 2022, 1:24pm

Hello! What database did you use at what cutoff? Latest version of Silva using cutoff of 80 is usually making it right for me. Ran into that kind of problems with older versions of taxonomy references database where, using my mock community control. For example, Pseudomonas was ok with RDp but not with Silva or vice versa I do not remember correctly. Had to lower the cutoff to 75 to get this peculiar genus right. Anyway, how is your positive control looking?

pschloss · September 30, 2022, 12:52pm

BLAST and the RDP call the same sequence different things. Also, you are look at the names that the scientist who deposited the sequence gave the sequence, which is not necessarily correct or as robust as a true taxonomy.

Finally, the algorithms you are using are fundamentally different. The naive bayesian classifier used in classify.seqs is far more robust than blasting a sequence and taking the top match.

Pat

Hira · October 6, 2022, 5:55pm

I used reference files from Silva reference files.
Release 138.1. * Full-length sequences and taxonomy references (128884 bacteria, 2846 archaea, and 14871 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~6.8 GB and the compressed version is 241 MB.

I did not use any cutoff parameter in classify.otu command. The default is 51.

Moreover, I did another thing. I found out the representative sequence of each OTU using the following:

mothur > get.oturep(fasta=final.ng.fasta, count=final.count_table, method=abundance, list=final.ng.agc.list)
You did not provide a label, using 0.03.
0.03 17032

Output File Names:

/users/desktop/mothur/untitled folder/final.ng.agc.0.03.rep.count_table
/users/desktop/mothur/untitled folder/final.ng.agc.0.03.rep.fasta

I did ncbi blast of sequences of otus from final.ng.agc.0.03.rep.fasta file. The blast results were similar to the taxonomy given in my final.ng.agc.0.03.cons.taxonomy file. Is this method correct?

Alexandre_Thibodeau · October 12, 2022, 1:14pm

Almost, use cutoff of 0.8 for the classify.otu. Follow the online SOP and you should be fine until you get familiar with Mothur and start tailoring the pipeline to your liking.

pschloss · October 14, 2022, 3:17pm

Also, if you want a taxonomy for an OTU, I strongly encourage people to use classify.otu rather than get.oturep and classifying the output. You can see how we do this in the SOP

Pat

Topic		Replies	Views
NCBI database Commands in mothur	10	2713	November 8, 2018
OTU classification in taxonomy file and RDP classification of rep sequence don't agree Theory behind mothur	8	2829	July 30, 2017
Classification of Sequences Commands in mothur	3	795	May 10, 2017
Errors with classify.otu and silva taxonomy for Archaea mothur bugs	3	7257	July 20, 2011
classify.otu: sequences is not in your taxonomy file Commands in mothur	6	3579	December 7, 2015

Taxonomy results of mothur do not match with NCBI blast results

Related topics