V4 region of Silva Database

I am not sure if my question belongs to this category and apologize in advance if this is not the right section. I am trying to just get the V4 regions of several sequences in the Silva Database. What I have done so far is used the E.coli 16s sequence and used the 515f/806r primers to get the E.coli V4 region. I then used align.seqs to align the Silva Database sequences to the E.coli V4. Then I ran summary seqs on the alignment to get the start and end for the Silva Database sequences. However, below is the output I am getting :

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1 5 1 0 1 11217
25%-tile: 1 253 239 0 4 112164
Median: 1 253 253 0 4 224327
75%-tile: 1 253 253 0 5 336490
97.5%-tile: 253 254 254 1 6 437437
Maximum: 253 280 280 29 23 448653
Mean: 32.0223 236.832 205.529 0.0541331 4.23424

of Seqs: 448653

I am just wondering if there is a problem with my logic.

If your ecoli sequence is not aligned, then you probably don’t want to align the silva database against it. You want to stop a step back and align your ecoli against silva.bacteria.fasta (or whatever).


Pat thanks for your reply. My ecoli sequence is not aligned (http://rdp.cme.msu.edu/hierarchy/detail.jsp?seqid=S000000258&format=fasta). So my first step was to align the ecoli trimmed sequence (removed nucleotides before and after the 515f/806r primers). I aligned the ecoli trimmed sequence to the subset of Silva aligned sequences that I downloaded from the Silva website. Here are the commands I am using:

  1. align.seqs(candidate=ecoli_v4.fasta, reference=arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta) #arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is the subset of Silva aligned sequences
  2. summary.seqs(fasta=ecoli_v4.align)
  3. pcr.seqs(fasta=arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta,ecoli=ecoli_v4.align,keepdots=false) #use the ecoli_v4.align to only get the v4 region from the subset of Silva aligned sequences

However, I am getting the error in step 2. Thanks for your help.

I’m not sure what arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is. Can you use our silva reference alignments? These are available at http://www.mothur.org/wiki/Silva_reference_alignment

Apologies for the confusion. arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is a download from Silva that only contains the sequences from Silva that are only associated with the phylum Firmicutes. Here is the link that I got the file from http://www.arb-silva.de/browser/. Is there a way to subset your silva reference alignment to just get the sequences assigned to Firmicutes?

Thanks again.

You can run get.lineage with taxon=Firmicutes