Using silva as reference database in MiSeq SOP

arvalve · December 1, 2020, 2:24pm

Hi,

I’m planning on using the silva database for my metagenomic analysis and was hoping to get a few things clarified -

The silva.bacteria.fasta currently available in the MiSeq SOP was compiled using which version of the silva database? Also apart from this, where can I get the .fasta files for the database?
The silva v132 full length database downloaded from https://mothur.org/wiki/silva_reference_files/, has only a .align file and a .tax file. Can I use the .align file instead of the fasta file for customizing the database using pcr.seqs and use that output in the “reference=” parameter instead of the fasta file as instructed in the MiSeq SOP?
I would like someone to verify if my understanding is correct - the full length database available in the above link is basically the silva full database aligned using a few sequences that are available under the SEED database. What is the purpose of doing do? And, if I were to use these 2 databases for aligning a query sequence how would I go about doing the same?

Thanks in advance for all the help.

EDIT: Additionally I want to know why in MiSeq SOP, the alignment is done with Silva, while the Bayesian classifier is done with RDP. Is there any dis-advantage of using Silva itself?

leocadio · December 1, 2020, 5:16pm

Just the easy replies:

The align is the fasta version, aligned. I am not sure what fasta you are asking about. Then you can PCR seqs on the align - I think this is how it is explained elsewhere how to do it.

Not sure about the alignment procedure, but having the alingment is the best way to then compare all sequences to each other… And to know where are the limits of your sequences, if they are missing the beginning or the end… And many more things. The classifier uses RDP, that is a method, against the SILVA database. Again, I am not sure what you are asking here.

arvalve · December 2, 2020, 2:02am

Thanks for your reply. The MiSeq SOP mentions the use of the silva database as a .fasta format file, which is customized using the pcr.seqs and then aligned with the query sequences. But the reference files available for download had only .align files, so I was just wondering if the .align and .fasta files are equivalent and can be used in place of the other.
My other query was again in the MiSeq SOP, the alignment step uses the silva reference sequences, but in the classify.seqs step the RDP files are used. The silva database contains its own .tax file, why is the RDP tax and sequence files prefered over the silva’s one, especially considering that silva was used in the previous step?

leocadio · December 2, 2020, 2:36pm

I am not sure of the SOP - sorry. I will leave the real MOTHUR people answer that question : )

But, yes, the align file is a fasta file.

pschloss · December 3, 2020, 5:30pm

For alignment (i.e. align.seqs), there’s no suitable to using the silva reference alignments to align your sequences. greengenes’s and rap’s alignments are horrible. For classification (i.e. classify.seqs) you can use whatever reference you want. Some prefer silva or greengenes to RDP because they are larger and have more information for as yet uncultured taxa. Some prefer the RDP because its taxonomy is based on the authoritative Bergey’s taxonomy. Some prefer silva over greengenes because silva is still getting updated whereas greengenes is not.

As was mentioned, *.align files are fasta files. You’ll know something is a fasta file if the first line for each sequence starts with a > character. You’ll know something is aligned if you see . and - characters in the sequence data and if the sequences are all the same length.

Pat

arvalve · December 3, 2020, 5:45pm

Thanks so much. That clears up my doubts.

gwidmer · December 8, 2020, 1:31pm

I can relate to the point made by arvalve. It seems odd that the output of align.seqs is .align and not .align.fasta. I remember being confused when I started using mothur. When you enter the aligned file into filter.seqs, filter asks for a fasta file, yet your input ends in .align. Perhaps a lack of consistency that might be worth fixing?

Giovanni Widmer

Topic		Replies	Views
Using both Silva and Greengenes in the analysis? Theory behind mothur	1	1548	November 1, 2016
Silva database vs RDP Commands in mothur	2	3069	October 27, 2014
Aligned SILVA vs unaligned RDP Theory behind mothur	3	1195	May 3, 2023
Database in tutorial vs silva 138.1 Commands in mothur	8	791	March 17, 2022
Silva 132 database problem	9	2166	March 27, 2020

Using silva as reference database in MiSeq SOP

Related topics