Could someone point me to sources regarding building reference alignment libraries? I’m unsure as to how the gaps are determined, and what their purpose is, exactly. My advisor wishes me to translate what occurs in mothur to a formal mathematical language, but it’s terribly difficult to create well-defined objects when you don’t really know what they are.
The silva and greengene-based reference alignments were taken from SILVA and greegenes. The gaps are inserted to preserve positional homology and the secondary structure of the rRNA that the gene codes for.
Hope this gets you going…
In fact that’s perfect. I also have a question regarding reference sequence identifiers as per your (2009) paper: how are these chosen?