This is not exactly a bug in mothur, but I wanted to know if anyone has encountered this problem.
Our plan was to sequence the entire 16S gene from both the front and back and then bring them together to make a full length alignment, as is usual practice. However, about 2/3 of our sequences didn’t overlap in the middle, leaving gaps up to 600 bp long, but averaging about 125 bp. To get around this problem, we aligned the left and right halves separately using the Infernal aligner and then merged the two directions-adding a single large gap to fill in the sequences that didn’t overlap.
We didn’t know how this large, biologically meaningless gap in many of our sequences would affect tree building and OTU analysis in mothur (indeed many chimera checking programs gave nonsensical results). Therefore, for all of our sequences we trimmed out the positions in the middle which correspond to mainly missing data positions and brought the two ends together. This leaves a final “full length” alignment of ~12,000 bp.
Has anyone had to deal with this issue before and was it overcome.