V4 region of Silva Database

munasur12 · October 17, 2014, 3:19am

I am not sure if my question belongs to this category and apologize in advance if this is not the right section. I am trying to just get the V4 regions of several sequences in the Silva Database. What I have done so far is used the E.coli 16s sequence and used the 515f/806r primers to get the E.coli V4 region. I then used align.seqs to align the Silva Database sequences to the E.coli V4. Then I ran summary seqs on the alignment to get the start and end for the Silva Database sequences. However, below is the output I am getting :

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1 5 1 0 1 11217
25%-tile: 1 253 239 0 4 112164
Median: 1 253 253 0 4 224327
75%-tile: 1 253 253 0 5 336490
97.5%-tile: 253 254 254 1 6 437437
Maximum: 253 280 280 29 23 448653
Mean: 32.0223 236.832 205.529 0.0541331 4.23424

of Seqs: 448653

I am just wondering if there is a problem with my logic.

pschloss · October 20, 2014, 9:41pm

If your ecoli sequence is not aligned, then you probably don’t want to align the silva database against it. You want to stop a step back and align your ecoli against silva.bacteria.fasta (or whatever).

Pat

munasur12 · October 21, 2014, 3:47pm

Pat thanks for your reply. My ecoli sequence is not aligned (http://rdp.cme.msu.edu/hierarchy/detail.jsp?seqid=S000000258&format=fasta). So my first step was to align the ecoli trimmed sequence (removed nucleotides before and after the 515f/806r primers). I aligned the ecoli trimmed sequence to the subset of Silva aligned sequences that I downloaded from the Silva website. Here are the commands I am using:

align.seqs(candidate=ecoli_v4.fasta, reference=arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta) #arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is the subset of Silva aligned sequences
summary.seqs(fasta=ecoli_v4.align)
pcr.seqs(fasta=arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta,ecoli=ecoli_v4.align,keepdots=false) #use the ecoli_v4.align to only get the v4 region from the subset of Silva aligned sequences

However, I am getting the error in step 2. Thanks for your help.

pschloss · October 21, 2014, 4:57pm

I’m not sure what arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is. Can you use our silva reference alignments? These are available at http://www.mothur.org/wiki/Silva_reference_alignment

munasur12 · October 21, 2014, 5:27pm

Pat,
Apologies for the confusion. arb_silva_de_2014_10_20_id209537_with_gaps_no_newlines.fasta is a download from Silva that only contains the sequences from Silva that are only associated with the phylum Firmicutes. Here is the link that I got the file from http://www.arb-silva.de/browser/. Is there a way to subset your silva reference alignment to just get the sequences assigned to Firmicutes?

Thanks again.

pschloss · October 24, 2014, 7:20pm

You can run get.lineage with taxon=Firmicutes

Topic		Replies	Views
Customize Silva reference for V4 region Commands in mothur	6	556	August 13, 2023
sliva bacteria region position Theory behind mothur	1	3997	May 14, 2014
Co-ordinates for aligning v5-v7 region	13	485	September 1, 2023
Creating a customized reference alignment for V1-V2 Commands in mothur	2	922	January 19, 2020
Training a V4 database from a reference Theory behind mothur	6	816	February 24, 2020

V4 region of Silva Database

of Seqs: 448653

Related topics