Miseq SOP pcr.seqs

Hi,
When using pcr.seqs (i.e., following the Miseq SOP) how do you know where your sequences start and end against the reference database?

Thanks

1 Like

Hi Pat,

I am also not clear on how to find the position the sequences alignment agaisnt the reference database. The numbers used for the V4 region in the MiSeq SOP are start=11894, end=25319, but I’m just not sure what these are.

Thanks,

Claire

These are the positions within the silva.bacteria.fasta alignment. You would need to find your primer in the alignment and then see where your primers are mapping to. Alternatively, you could run pcr.seqs on an E. coli sequence with your primers, align the product to silva.bacteria.fasta and then run summary.seqs on the algined sequence. The start and stop would be the coordinates of what you want.

1 Like

Thank you!

It took me a while to figure out how to follow the instructions of pschloss.
So, I thought that I would post how to accomplish this task in a more specific manner
to further help beginners like me.

mothur “#pcr.seqs(fasta=ecoli.16srrna.fasta, oligos=pcrTest.oligos)”

A)
I obtained my ecoli 16S rRNA sequence from NCBI and saved it as a fasta file.
http://www.ncbi.nlm.nih.gov/nuccore/174375?report=fasta

B)
pcrTest.oligos is a two line file containing the primers. See http://www.mothur.org/wiki/Oligos_File for more details.
forward ACTCCTACgggAggCAgCAg
reverse GGACTACHVGGGTWTCTAAT

mothur “#align.seqs(fasta=ecoli.16srrna.pcr.fasta, reference=silva.bacteria.fasta)”

mothur “#summary.seqs(fasta=ecoli.16srrna.pcr.align)”

Results
My primers amplify the V3 and V4 regions and their 5’ positions correspond to positions 6428 and 23444 in the 50000 bp SILVA alignment. See below.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 6428 23444 429 0 6 1
2.5%-tile: 0 0 0 0 0 1
25%-tile: 0 0 0 0 0 1
Median: 0 0 0 0 0 1
75%-tile: 0 0 0 0 0 1
97.5%-tile: 0 0 0 0 0 1
Maximum: 6428 23444 429 0 6 1
Mean: 6428 23444 429 0 6

of Seqs: 1

Output File Names:
ecoli.16srrna.pcr.summary

1 Like

Note: I used the reference from the tutorial, which is much smaller and outdated. For this purpose
(of finding the primer locations), using this reference is fine. However, when actually processing
data through MOTHUR, use the most updated one (Release 119).
http://www.mothur.org/wiki/Silva_reference_files “Full length sequences and taxonomy references”

1 Like

Useful cheat sheet ccolling!

Hello,

I tried reproducing this, but I only obtained an empty “E.coli_16S_RNA.pcr.fasta” file and everything was in “E.coli_16S_RNA.scrap.pcr.fasta” file.
I downloaded the 16S from the link of ccooling and saved it as “E.coli_16S_RNA.fasta”.

Here’s my script and my primer file::

And here is the Output file:

mothur > set.dir(input=/stn4/ul/monthly/landreot/1step/Silva)
Mothur’s directories:
inputDir=/stn4/ul/monthly/landreot/1step/Silva/

mothur > set.dir(output=/stn4/ul/monthly/landreot/1step/Silva/Output)
Mothur’s directories:
outputDir=/stn4/ul/monthly/landreot/1step/Silva/Output/

mothur > pcr.seqs(fasta=E.coli_16S_RNA.fasta, oligos=pcrTest.oligos)

Using 1 processors.

Output File Names:
/stn4/ul/monthly/landreot/1step/Silva/Output/E.coli_16S_RNA.pcr.fasta
/stn4/ul/monthly/landreot/1step/Silva/Output/E.coli_16S_RNA.bad.accnos
/stn4/ul/monthly/landreot/1step/Silva/Output/E.coli_16S_RNA.scrap.pcr.fasta


It took 0 secs to screen 1 sequences.

mothur > align.seqs(fasta=E.coli_16S_RNA.fasta, reference=silva.bacteria.fasta)

Using 1 processors.

Reading in the silva.bacteria.fasta template sequences… [ERROR]: template is not aligned, aborting.
DONE.
It took 2 to read 637 sequences.

mothur > summary.seqs(fasta=ecoli.16srrna.pcr.align)
Unable to open ecoli.16srrna.pcr.align. Trying default /software/EcologyEvolution/mothur/1.36.1/bin/ecoli.16srrna.pcr.align
Unable to open /software/EcologyEvolution/mothur/1.36.1/bin/ecoli.16srrna.pcr.align. Trying output directory /stn4/ul/monthly/landreot/1step/Silva/Output/ecoli.16srrna.pcr.align
Unable to open /stn4/ul/monthly/landreot/1step/Silva/Output/ecoli.16srrna.pcr.align

Using 1 processors.
[ERROR]: did not complete summary.seqs.

mothur > quit()

From what I understand, it is the pcr.seqa() command which doesn’t work properly. But I can’t figure why

Hi there,

I’m almost 100% confident that

forward TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
reverse GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCT

…are not your primers. Looks like you might have the adapter primers in there a well. You only want to include the part of the primer that anneals to the 16S rRNA gene.

Pat

Indeed you were right. After asking my supervisor, it indeed wasn’t the correct primer sequences.

After some search, I corrected it:

But the script still doesn’t work:

Any idea of the source of the problem this time?

All best,
Pressurized

Can you re-download the silva.bacteria.fasta reference and try again? Looks like something weird happened to it.

Hi,

I followed all the steps as given by ccolling (Thanks ccolling)
Instead of silva database I used green genes.
As mentioned by Pressurized, I am also getting empty “E.coli_16S_RNA.pcr.fasta” file and everything is in “E.coli_16S_RNA.scrap.pcr.fasta” file.
I used the “E.coli_16S_RNA.scrap.pcr.fasta” for alignment and the sumamry results are
Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 2264 6850 1009 0 6 1
2.5%-tile: 0 0 0 0 0 1
25%-tile: 0 0 0 0 0 1
Median: 0 0 0 0 0 1
75%-tile: 0 0 0 0 0 1
97.5%-tile: 0 0 0 0 0 1
Maximum: 2264 6850 1009 0 6 1
Mean: 2264 6850 1009 0 6

of Seqs: 1

I have v4 region of 16S.
I was wondering if I am doing it correct by using “E.coli_16S_RNA.scrap.pcr.fasta”.
The “E.coli_16S_RNA.scrap.pcr.fasta” contains

gi|174375|gb|J01859.1|ECORRD|rt fpdiffs=0(match) rpdiffs=1000(noMatch) E.coli 16S ribosomal RNA
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTG…

Thanks!

If the sequence is winding up in your scrap file, that suggests that your primers aren’t correct or the sequence is in the wrong direction.

Hi pressurized, the command

mothur > align.seqs(fasta=E.coli_16S_RNA.fasta, reference=silva.bacteria.fasta)

should be

mothur > align.seqs(fasta=E.coli_16S_RNA.pcr.fasta> , reference=silva.bacteria.fasta)

I am fairly sure this was the source of your error. Best of luck.