Dear Patrick, dear mothur users,
I have two questions, one related to Illumina Miseq read quality (paired-end reads) and one to the way mothur aligns.
I noticed that there are many quite large deletions (often of 10 nucleotides) in a very conserved region after the V4 of the 18S (I’m working with a group of protists). Did someone else oberve the same pattern? Was it signaled already?
I use mothur to align my reads according to a template. The template doesn’t have any gaps in this region and I used a high penalty for opening gaps (gapopen=-5).
In the alignment, I noticed that most often the deletions result in several gaps, like this:
T G A A G T A A T A T G A T T G A T A G G G
T G A A G _ _ _ _ _ _ _ _ _ _ _ A T A G G G -> one deletion of 11 nucleotides, should be one gap
T _ _ _ G _ A _ _ _A _ G _ _ _ _ A T A G G G -> the flanking region before the gap is “spread” into the gap, resulting in 5 gaps!
Since the number of gaps can make a difference in the distance between otherwise identical (or closely related) sequences (option “calc” with the default “onegap” in dist.seqs), adding artificial gaps will increase the number of final OTUs. I don’t think that using the “nogap” option would be a solution, since there are other parts of the alignment where gaps have acutally a meaning (difference in lenghts in the variable helices).
Why does mothur tend to create that many gaps? Is there any option to correct this?
Waiting for your answer,