Classify.seqs Vs assignTaxonomy(DADA2)?

Dinesh · September 3, 2020, 5:10am

Hi there,
I created custom nifH database and tested using both classify.seqs and assignTaxonomy for OTUs (Usearch) and ASV (DADA2).

I would like to know

Why DADA2 assignTaxonomy is giving higher percentage of classification compared to MOTHUR (for both OTUs and ASVs) since both are based on naive bayesian knn ?
Why DADA2 taxonomic assignments at genera level is highly diverse compared to MOTHUR (for both OTUs and ASVs)?

Regards and Thanks,
Dinesh

pschloss · September 3, 2020, 3:01pm

Hi,

There are a lot of different possible reasons why the output would differ including the reference set and the classification algorithm. When you talk about mothur’s classification, you are actually using the naive Bayesian algorithm developed by Wang et al. in AEM. I seem to recall that the USEARCH algorithm was not as stringently tested as the naive Bayesian algorithm. The latter used a leave-one-out testing approach where as the former did not.

Pat

Dinesh · September 3, 2020, 7:41pm

Dear Sir,
Thanks for your reply. Usearch and DADA2 used for generating OTUs and ASVs respectively from a dataset. These OTUs/ASVs were analysed by classify.seqs (MOTHUR) and assignTaxonomy(DADA2) using custom nifh and zehr nifh databases.

Custom and zehr databases compared in 4 combinations given below

OTUs classified using MOTHUR (custom or zehr database)
OTUs classified using DADA2 (custom or zehr)
ASVs classified using MOTHUR (custom or zehr)
ASVs classified using DADA2 (custom or zehr)

I understand variation within one of the 4 combinations that may be due to sequences in reference dataset (custom or zehr)

For example, the classification of OTUs using Custom database by DADA2 (No. 2) gives higher percent of classification and higher diversity compared to MOTHUR (No. 1). Here database is same, only platform is different. I would like to know the reason since dada and MOTHUR, both are based on naive Bayesian. I did not use MOTHUR for generating OTU.
Thanks and regards
Dinesh S L

pschloss · September 3, 2020, 7:55pm

What do they use as the cutoff confidence score? We use the recommended 80% cutoff described in the Wang paper.
Pat

Dinesh · September 4, 2020, 4:14pm

Dear sir,
I used 89 % cutoff for nifH classification at genus level. Dada2 by default uses minboot=50. Could this be reason for higher percent of sequences getting classified and taxonomic assignment ?
Thanks for spending your valuable time.

Regards,
Dinesh S L

pschloss · September 4, 2020, 4:35pm

definitely. Try using the same for both

Dinesh · September 15, 2020, 6:44am

Dear Sir,
I have used 89 as cutoff for MOTHUR (classify.seqs) and minBoot = 89 in DADA for same input sequences (ASVs).

The classification percentages from Kingdom to Genus levels were,

Mothur	DADA2
41.64	43.72
10.94	10.14
6.30	4.07
4.03	2.46
3.46	1.18
3.03	0.33

The types of genus and family was same in both (e.g. 4 genera in both methods), whereas it differed only in abundance.

AssignTaxonomy parameters in DADA2 (https://rdrr.io/bioc/dada2/man/assignTaxonomy.html).

Note:
Both methods are based on Wang 2007, AEM paper, use kmer=8 and cutoff=89. I am wondering regarding the variation.

Regards,
Dinesh S L

pschloss · September 15, 2020, 1:03pm

The method uses a random number generator to run a bootstrapping procedure.The default is 100 iterations, which is decent, but the confidence level isn’t going to be dead on to the true value. If you went to 1000 or 10,000 iterations the confidence score would have more confidence and I suspect the two sets of numbers would be closer.

Pat

Topic		Replies	Views
Mothur vs dada2 Theory behind mothur	9	3544	June 12, 2020
Sequence classification in MOTHUR vs QIIME Theory behind mothur	1	4345	December 16, 2013
Functional gene classification using classify.seqs	1	471	January 7, 2021
Struggling to understand classification Theory behind mothur	4	4215	September 1, 2015
Taxonomy in classify seqs Theory behind mothur	3	3135	February 27, 2015

Classify.seqs Vs assignTaxonomy(DADA2)?

Related topics