align.seqs using reference sequences of variable length

maasha · October 29, 2013, 12:25pm

I have used pcr.seqs with an oligo file and GreenGenes representative 16S rRNA OTUs as input (both aligned and unaligned). The resulting FASTA files contain sequences of variable length, because this amplicon have variable length. However, the align.seqs command must be supplied an aligned reference with constant sequence lengths. One way forward would be to locate the positions of the primers in the aligned reference. Are there any tools that will help me do that? Or what are my options?

westcott · October 29, 2013, 4:47pm

Could you post the mothur commands you ran and the version of mothur you are using?

maasha · October 29, 2013, 4:57pm

mothur v.1.32.0

I am following the MiSeq SOP. The relevant steps are:

pcr.seqs(fasta=gg_97_otus_4feb2011_aligned.fasta, oligos=jt_primers.txt, keepdots=F, processors=20, pdiffs=2)

and

pcr.seqs(fasta=gg_97_otus_4feb2011_aligned.fasta, oligos=jt_primers.txt, keepdots=F, processors=20, pdiffs=2)

I also tried unaligned versions of the above Green Genes files.

jt_primers.txt is in the format:

primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT11
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT12
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT13
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT21
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT22
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT23

westcott · October 29, 2013, 5:25pm

Could you send me the fasta and oligos file to mothur.bugs@gmail.com?

maasha · October 29, 2013, 7:07pm

With the snippet of the oligo file and the the Green Genes files that are in the tarball linked below, I supposed my situation can be replicated, though I was expecting the “[ERROR]: template is not aligned, aborting.” was due to something I did wrong and not a bug:

http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/Reference_OTUs_for_Pipelines/Caporaso_Reference_OTUs/gg_otus_4feb2011.tgz

maasha · October 31, 2013, 12:39pm

I tested the primers with Biopieces:

read_fasta -n 1000 -i gg_97_otus_4feb2011.fasta | pcr_seq -f GTGCCAGCMGCCGCGGTAA -R GGACTACHVGGGTWTCTAAT | grab -p PCR | count_records -x
RECORDS_COUNT: 824
---

Which confirms the order of the primers (82% of the sequence have the primer pair not allowing for mismatches or indels). Thus, the oligo file have the primers in correct order. pcr.seqs() do in fact give output from the oligo file in the *.pcr.fasta file when using the before-mentioned oligo file - swapping the primers results in empty *.pcr.fasta files.

However, the following align.seqs() step still goes [ERROR]: template is not aligned, aborting.

And yes, the *.pcr.fasta file is of uneven length:

read_fasta -i gg_97_otus_4feb2011_aligned.pcr.fasta | analyze_vals -k SEQ_LEN -x | write_tab -cpx
+---------+---------+-------+------+------+----------+---------+
| KEY     | TYPE    | COUNT | MIN  | MAX  | SUM      | MEAN    |
+---------+---------+-------+------+------+----------+---------+
| SEQ_LEN | Numeric | 31208 | 1794 | 1797 | 55987160 | 1794.00 |
+---------+---------+-------+------+------+----------+---------+

… because the amplicons are of different lengths. So should I use the primer positions to slice out the amplicon from the 16S alignment instead? And are there any Mothur tools to help me locate the correct positions?

westcott · November 1, 2013, 1:30pm

I will correct this issue in mothur for the next release. In the meantime, as a workaround setting keepdots=t would resolve the issue.

maasha · November 4, 2013, 1:49pm

Thank you. Changing to keepdots=t seems to be working.

Topic		Replies	Views
Alignment seqs of different lengths? Commands in mothur	4	4293	April 6, 2010
23s rRNA database alignment Commands in mothur	2	1919	June 1, 2015
pcr.seqs for greengenes Commands in mothur	1	1572	October 19, 2015
Align.seqs() [ERROR]: template is not aligned, aborting Commands in mothur	7	2052	February 10, 2019
Align.seqs with the pr2 database Commands in mothur	2	521	July 31, 2022

align.seqs using reference sequences of variable length

Related topics