I created custom nifH database and tested using both classify.seqs and assignTaxonomy for OTUs (Usearch) and ASV (DADA2).
I would like to know
- Why DADA2 assignTaxonomy is giving higher percentage of classification compared to MOTHUR (for both OTUs and ASVs) since both are based on naive bayesian knn ?
- Why DADA2 taxonomic assignments at genera level is highly diverse compared to MOTHUR (for both OTUs and ASVs)?
Regards and Thanks,
There are a lot of different possible reasons why the output would differ including the reference set and the classification algorithm. When you talk about mothur’s classification, you are actually using the naive Bayesian algorithm developed by Wang et al. in AEM. I seem to recall that the USEARCH algorithm was not as stringently tested as the naive Bayesian algorithm. The latter used a leave-one-out testing approach where as the former did not.
Thanks for your reply. Usearch and DADA2 used for generating OTUs and ASVs respectively from a dataset. These OTUs/ASVs were analysed by classify.seqs (MOTHUR) and assignTaxonomy(DADA2) using custom nifh and zehr nifh databases.
Custom and zehr databases compared in 4 combinations given below
- OTUs classified using MOTHUR (custom or zehr database)
- OTUs classified using DADA2 (custom or zehr)
- ASVs classified using MOTHUR (custom or zehr)
- ASVs classified using DADA2 (custom or zehr)
I understand variation within one of the 4 combinations that may be due to sequences in reference dataset (custom or zehr)
For example, the classification of OTUs using Custom database by DADA2 (No. 2) gives higher percent of classification and higher diversity compared to MOTHUR (No. 1). Here database is same, only platform is different. I would like to know the reason since dada and MOTHUR, both are based on naive Bayesian. I did not use MOTHUR for generating OTU.
Thanks and regards
Dinesh S L
What do they use as the cutoff confidence score? We use the recommended 80% cutoff described in the Wang paper.
I used 89 % cutoff for nifH classification at genus level. Dada2 by default uses minboot=50. Could this be reason for higher percent of sequences getting classified and taxonomic assignment ?
Thanks for spending your valuable time.
Dinesh S L
definitely. Try using the same for both
I have used 89 as cutoff for MOTHUR (classify.seqs) and minBoot = 89 in DADA for same input sequences (ASVs).
The classification percentages from Kingdom to Genus levels were,
The types of genus and family was same in both (e.g. 4 genera in both methods), whereas it differed only in abundance.
AssignTaxonomy parameters in DADA2 (https://rdrr.io/bioc/dada2/man/assignTaxonomy.html).
Both methods are based on Wang 2007, AEM paper, use kmer=8 and cutoff=89. I am wondering regarding the variation.
Dinesh S L
The method uses a random number generator to run a bootstrapping procedure.The default is 100 iterations, which is decent, but the confidence level isn’t going to be dead on to the true value. If you went to 1000 or 10,000 iterations the confidence score would have more confidence and I suspect the two sets of numbers would be closer.