Hi,
I have another question concerning a customized (updated) version of the SILVA database.
After reading a few other posts in this forum I created a taxonomy and nogap.fasta file (by parsing the SSURef_111_tax_silva_trunc.fasta file and basically splitting it into two, separately for Eukaryotes/Archaea/Bacteria). This is for the classify.seqs command.
Another thing which, I guess, should improve results, is an updated alignment. For this I took the file SSURef_111_tax_silva_full_align_trunc.fasta and basically reformatted it to remove spaces and line breaks within the sequence. This is for the align.seqs command.
Now, as a test, I wanted to reproduce the silva102-files for Eukaryotes provided by mothur. For this I performed same thing as above on the SSURef_102_… files downloaded from the silva archive. Basically, I expected that the
eukaryota.SSURef_102_SILVA_NR_99.tax file (produced by me) and the silva.eukarya.silva.tax file would be identical. However, my tax file has 31,809 entries, while the silva.eukarya.silva.tax file has only 1,238 entries. For example, the entry AB026819.394.2191 is missing from it compared to my file (which is based on SSURef_102_SILVA_NR_99). Now I’m a bit confused. How is this possible?
Thanks in advance…