I’m now quite confused for how align.seqs step in mothur works. There are only ~250 bases in V4 regions. However, in SOP, after alignment, the start point is 1969 and the end point is 11551, while there are only ~250 bases. Why does that happen? How was SILVA database build and how does mothur conduct alignment steps?
The silva reference alignment (and greengenes) have extra columns that only contain gap characters. Those columns don’t contain any data. They are there as padding in case there’s novel sequence diversity encountered. As an example, the TM7 have an intron in the V1 region and so it’s good to have padding there to accommodate TM7 sequences. You can learn more about the algorithm in align.seqs by looking at these papers from my group…