align.seqs using reference sequences of variable length

I have used pcr.seqs with an oligo file and GreenGenes representative 16S rRNA OTUs as input (both aligned and unaligned). The resulting FASTA files contain sequences of variable length, because this amplicon have variable length. However, the align.seqs command must be supplied an aligned reference with constant sequence lengths. One way forward would be to locate the positions of the primers in the aligned reference. Are there any tools that will help me do that? Or what are my options?

Could you post the mothur commands you ran and the version of mothur you are using?

mothur v.1.32.0

I am following the MiSeq SOP. The relevant steps are:

pcr.seqs(fasta=gg_97_otus_4feb2011_aligned.fasta, oligos=jt_primers.txt, keepdots=F, processors=20, pdiffs=2)

and

pcr.seqs(fasta=gg_97_otus_4feb2011_aligned.fasta, oligos=jt_primers.txt, keepdots=F, processors=20, pdiffs=2)

I also tried unaligned versions of the above Green Genes files.

jt_primers.txt is in the format:

primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT11
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT12
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT13
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT21
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT22
primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT JT23

Could you send me the fasta and oligos file to mothur.bugs@gmail.com?

With the snippet of the oligo file and the the Green Genes files that are in the tarball linked below, I supposed my situation can be replicated, though I was expecting the “[ERROR]: template is not aligned, aborting.” was due to something I did wrong and not a bug:

http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/Reference_OTUs_for_Pipelines/Caporaso_Reference_OTUs/gg_otus_4feb2011.tgz

I tested the primers with Biopieces:

read_fasta -n 1000 -i gg_97_otus_4feb2011.fasta | pcr_seq -f GTGCCAGCMGCCGCGGTAA -R GGACTACHVGGGTWTCTAAT | grab -p PCR | count_records -x
RECORDS_COUNT: 824
---

Which confirms the order of the primers (82% of the sequence have the primer pair not allowing for mismatches or indels). Thus, the oligo file have the primers in correct order. pcr.seqs() do in fact give output from the oligo file in the *.pcr.fasta file when using the before-mentioned oligo file - swapping the primers results in empty *.pcr.fasta files.

However, the following align.seqs() step still goes [ERROR]: template is not aligned, aborting.

And yes, the *.pcr.fasta file is of uneven length:

read_fasta -i gg_97_otus_4feb2011_aligned.pcr.fasta | analyze_vals -k SEQ_LEN -x | write_tab -cpx
+---------+---------+-------+------+------+----------+---------+
| KEY     | TYPE    | COUNT | MIN  | MAX  | SUM      | MEAN    |
+---------+---------+-------+------+------+----------+---------+
| SEQ_LEN | Numeric | 31208 | 1794 | 1797 | 55987160 | 1794.00 |
+---------+---------+-------+------+------+----------+---------+

… because the amplicons are of different lengths. So should I use the primer positions to slice out the amplicon from the 16S alignment instead? And are there any Mothur tools to help me locate the correct positions?

I will correct this issue in mothur for the next release. In the meantime, as a workaround setting keepdots=t would resolve the issue.

Thank you. Changing to keepdots=t seems to be working.