Silva custom database

dt0740 · January 16, 2013, 1:23pm

Hi,

I have another question concerning a customized (updated) version of the SILVA database.

After reading a few other posts in this forum I created a taxonomy and nogap.fasta file (by parsing the SSURef_111_tax_silva_trunc.fasta file and basically splitting it into two, separately for Eukaryotes/Archaea/Bacteria). This is for the classify.seqs command.

Another thing which, I guess, should improve results, is an updated alignment. For this I took the file SSURef_111_tax_silva_full_align_trunc.fasta and basically reformatted it to remove spaces and line breaks within the sequence. This is for the align.seqs command.

Now, as a test, I wanted to reproduce the silva102-files for Eukaryotes provided by mothur. For this I performed same thing as above on the SSURef_102_… files downloaded from the silva archive. Basically, I expected that the
eukaryota.SSURef_102_SILVA_NR_99.tax file (produced by me) and the silva.eukarya.silva.tax file would be identical. However, my tax file has 31,809 entries, while the silva.eukarya.silva.tax file has only 1,238 entries. For example, the entry AB026819.394.2191 is missing from it compared to my file (which is based on SSURef_102_SILVA_NR_99). Now I’m a bit confused. How is this possible?

Thanks in advance…

dt0740 · January 16, 2013, 3:13pm

Ok, reading documentation can always be a good idea. I guess this is the explanation for the reduction in size (from http://www.mothur.org/wiki/Silva_reference_files):

“The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don’t know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v102) that have a 100% quality score to the SEED alignment and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer.”

There are two restrictions here (100% quality score to the SEED alignment, go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer) which must cause this reduction in size.

Should I proceed the same way to create my custom database?

By the way, a huge thank-you to Pat Schloss for mothur, which is really a beautiful data analysis tool.

Topic		Replies	Views
Tweaking databases to include custom sequences Commands in mothur	14	12990	May 28, 2016
Taxonomy File for Silva Database Commands in mothur	5	6355	January 15, 2013
Pcr.fasta and seq.fasta error Commands in mothur	2	208	May 26, 2023
align.seqs: Silva Seed database mothur bugs	7	2747	June 15, 2016
New SILVA reference file Feature requests	5	8004	October 3, 2012

Silva custom database

Related topics