Execute pcr.seqs

1106351165 · August 14, 2021, 4:02pm

I very confound that how should I know that the start position and the end position when I need to align my sequences to the reference alignment.

SJSalter · August 16, 2021, 2:20pm

Hello Huang Yanfang!

pcr.seqs cuts down the length of the reference file to match your expected region. However it is not a compulsory step, you can simply align to the full length reference file.

pcr.seqs may be appropriate if:

you are building a pipeline for multiple datasets covering the same amplicon region,
and/or
you are running out of disk space when you try to run a full length alignment.

First try aligning against the full length database (eg. “silva.bacteria.fasta” or “silva.seed_v132.align”) to see if it works. If you choose to make a custom alignment region, you could estimate the start and end locations based on alignment of your amplicon primers, or (if your full dataset is prohibitively large) by aligning a subsample of your data against the full length database.

You could do the latter as follows:
First extract the first 1000 reads from each fastq file.
Run these fastqs through the first few steps of the Mothur SOP.
Then:
align.seqs(fasta=first1000.unique.fasta, reference=silva.seed_v132.align)
summary.seqs(fasta=current, count=current)

You can then deduce the alignment region from the output. For example:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 13858 21274 117 0 3 1
2.5%-tile: 13862 21284 125 0 3 198901
25%-tile: 13862 21287 125 0 3 1989008
Median: 13862 21287 125 0 4 3978015
75%-tile: 13862 21287 125 0 4 5967022
97.5%-tile: 13862 21293 125 0 5 7757129
Maximum: 13862 21296 125 0 8 7956029
Mean: 13861 21286 124 0 3

For this dataset you’d see that everything aligns between 13,000 and 22,000 relative to the reference file (which has 50,000 columns). You could make your custom reference file using these values:
pcr.seqs(fasta=silva.seed_v132.align, start=13000, end=22000, keepdots=F, processors=8)

I hope that is helpful. Good luck with your analysis!

pschloss · August 17, 2021, 5:04pm

Thanks @SJSalter - in case others are looking for where @1106351165 got the screenshot from, there are instructions on customizing the region for use with pcr.seqs available here…

system · August 27, 2021, 5:04pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Miseq SOP pcr.seqs Commands in mothur	13	17345	February 25, 2016
Silva Bacteria Aligning Position Theory behind mothur	4	6327	July 27, 2014
choose start and end of pcr.seqs command Commands in mothur	12	4992	December 7, 2015
Align.seqs - queries. Commands in mothur	4	5822	January 30, 2014
Commands in Miseq SOP Commands in mothur	1	1849	March 15, 2014

Execute pcr.seqs

Related topics