too many gaps in alignment with template

Dear Patrick, dear mothur users,
I have two questions, one related to Illumina Miseq read quality (paired-end reads) and one to the way mothur aligns.

  1. I noticed that there are many quite large deletions (often of 10 nucleotides) in a very conserved region after the V4 of the 18S (I’m working with a group of protists). Did someone else oberve the same pattern? Was it signaled already?

  2. I use mothur to align my reads according to a template. The template doesn’t have any gaps in this region and I used a high penalty for opening gaps (gapopen=-5).
    In the alignment, I noticed that most often the deletions result in several gaps, like this:

T G A A G T A A T A T G A T T G A T A G G G
T G A A G _ _ _ _ _ _ _ _ _ _ _ A T A G G G -> one deletion of 11 nucleotides, should be one gap
T _ _ _ G _ A _ _ _A _ G _ _ _ _ A T A G G G -> the flanking region before the gap is “spread” into the gap, resulting in 5 gaps!

Since the number of gaps can make a difference in the distance between otherwise identical (or closely related) sequences (option “calc” with the default “onegap” in dist.seqs), adding artificial gaps will increase the number of final OTUs. I don’t think that using the “nogap” option would be a solution, since there are other parts of the alignment where gaps have acutally a meaning (difference in lenghts in the variable helices).
Why does mothur tend to create that many gaps? Is there any option to correct this?
Waiting for your answer,
Anna Maria

I’m not 100% on what you’re asking - a couple things that I think will help. (1) the extra gaps in the SILVA reference are largely structural and really don’t matter. (2) after aligning and screening the sequences, you should run filter.seqs(trump=., vertical=T), which will remove a lot of the structural gaps. (3) in dist.seqs if the two sequences you are comparing have gaps in the same positions, those positions are ignored. The punchline is that I don’t think what you’re worried about is an issue.

Pat