Silva.bacteria.fasta file

Hi

I’m struggling to obtain the newest version of the SILVA reference file (v 138.2) from the SILVA reference files link. I can’t download ARB to my PC. Has anyone managed to follow the README file to obtain the silva.bacteria.fasta file required to run the MiSeq SOP?

Is there any reason why each new silva.bacteria.fasta file with the updated SILVA version isn’t readily provided (like v102 is provided at the top of the MiSeq SOP workflow)? Perhaps they are and I just missed it…?

Thanks in advance for any help!

James

Hi James,

There’s no need for you to use ARB unless you’re doing some type of further customization. The links at Silva reference files are the fasta files you need to run the alignment. They are what the code in the README files generate. Does this help?

Pat

Hi Pat,

Thanks for your response.

I’m struggling to get my head around the difference between the silva.bacteria.fasta file from “SILVA-based bacterial reference alignment (v102)” and the release 138.2 files.

The release 138.2 download gives silva.seed_v138_2.align and silva.seed_v138_2.tax.

Is the silva.bacteria.fasta used as the example in the SOP a reformatted/trimmed version of a larger silva.seed_vxyz.align file? Therefore, to obtain the V4 region for v138.2, must I preprocess the silva.seed_v138_2.align file, or can I directly substitute it for the silva.bacteria.fasta file in the pcr.seqs command?

E.g. should I now run:
mothur > pcr.seqs(fasta=silva.seed_v138_2.align, start=11895, end=25318, keepdots=F)
compared to my previous work where I ran:
mothur > pcr.seqs(fasta=silva.bacteria.fasta, start=11895, end=25318, keepdots=F)
(former v138.2, latter v102)
Should the start/end coordinates be changed for v138.2?

Then, later on we run mothur > remove.lineage( ... taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota) to remove the non bacterial stuff from the aligned and trimmed silva.seed_v138_2.align.

James

EDIT -
I have now run:
pcr.seqs(fasta=sequence.fasta, oligos=pcrTest.oligos)
align.seqs(fasta=sequence.pcr.fasta, reference=silva.seed_v138_2.align)
summary.seqs(fasta=sequence.pcr.align)
using an E. coli sequence and primers 515f (GTGCCAGCMGCCGCGGTAA) and 806r (GGACTACHVGGGTWTCTAAT). The coordinates return as 13862 and 23444, as per the SOP when using silva.bacteria.fasta, so I will keep the coordinates as the original 11895 and 25318 for consistency.
My question still stands as to whether I directly substitute silva.bacteria.fasta for silva.seed_v138_2.align in pcr.seqs - I now think yes and that I am overthinking it, but confirmation would be great - I’ll get there in the end!

Thanks,
James

Hi James,

Yeah, the silva.*.align file is a drop in replacement for silva.bacteria.fasta. Initially we separated the three domains into separate files. But now they’re all in one file. I’d suggest not using remove.lineage on the silva files. Instead, run it as shown in the SOP. Also, we find that using the numeric coordinates works better than the primers in pcr.seqs since no primer set is 100% for all of the sequences in the database.

Hope this helps!
Pat