Database in tutorial vs silva 138.1

junkim83 · February 19, 2022, 9:03am

Hi,
I’m new to Mothur.
I’ve finished Miseq SOP tutorial based on mothur.org
Because of curiosity, I ran the analysis using tutorial fastq files with Silva 138.1 (mothur-compatible).

Compared to tutorial, it gave me different results. Of course I expected it, but I have few concerns.

Those are my questions.

(1) In tutorial, silva aligned database was used for align.seqs and RDP database was used for classify.seqs. Why? Why different database in single analysis?

(2) Hence, I re-analysed tutorial files using Silva 138.1 for both align.seqs and classify.seqs. It worked perfectly. But, gave me different results. I understand differences but it was too different. Was this coming from different database for classify.seqs between tutorial and re-analysed myself? I just wonder if I analyze my own data, should I use silva 138.1 for align.seqs and RDP 18 for classify.seqs? or It is OK to use silva 138.1 for both align.seqs and classify.seqs?

Below are two results. For example, (1) the most abundant phylum is Bacteroidetes in tutorial, but Bacteroidota in reanalyzed. (2) TM7 and Tenericutes are founded in tutorial version, but none in silva 138.1. (3) Patescibacteria is founded in silva 138.1 version, but not in tutorial version. Which one should I trust?

[this is used tutorial database (silva for align.seqs and RDP for classify.seqs]

[ this is used whole database (silva 138.1 for align.seqs and classify.seqs]

pschloss · February 22, 2022, 5:39pm

To some degree the choice of classification database is a matter of personal preference. The two classification databases have different sequences and even the same sequence could have different names in the two databases. RDP is closely tied to Bergey’s Manual of official bacterial taxonomy whereas SILVA does more to extract taxa names from a tree-based approach. My general recommendation is that unless there’s another reason, pick the database that gives the fewest unclassified sequences and go from there

Pat

junkim83 · February 23, 2022, 12:47pm

Great! Now I understand!

Thanks everytime!

Jun

Alexandre_Thibodeau · February 25, 2022, 1:35pm

To add to this: use controls!!

For my gut microbiota we now uses Silva 138. the latest iteration of RDP gave awfull results on our positive control, had to lower to 70 bootstrapping value to get the composition about right, which is awfull.

junkim83 · February 26, 2022, 1:42am

Hello Alexandre,

Positive control means Mock community?

Best,
Jun

Alexandre_Thibodeau · February 26, 2022, 11:46am

Yep!

Télécharger Outlook pour Android

Rishikesh · March 4, 2022, 8:20am

From where we can collect these different databases in fasta format except databases files are present in mothur?

Alexandre_Thibodeau · March 7, 2022, 3:44pm

Morning. Sorry, I do not get your question.
Mothur-formatted database are present on the github under references. As for positive controls, depends if you go with a commercial solution or homemade solution but in any case you can check the material and methods of relevant articles to see what other uses.

Best of sucess

system · March 17, 2022, 3:45pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using silva as reference database in MiSeq SOP	6	2342	December 8, 2020
silva database Commands in mothur	6	8151	February 5, 2011
Using both Silva and Greengenes in the analysis? Theory behind mothur	1	1566	November 1, 2016
Which reference to use for classify.seqs? Commands in mothur	2	1339	March 14, 2016
Silva database vs RDP Commands in mothur	2	3081	October 27, 2014

Database in tutorial vs silva 138.1

Related topics