Hello all -
I am currently using mothur to compare a set of environmental 16S sequences to database reference sequences. I’ve come to the conclusion that different methods can make quite important differences here. I think that aligning to a proper reference alignment (e.g., one built w/ secondary structure information) is critical to this.
I’ve run mothur’s align.seqs and am really interested in the
SimBtwnQuery&Template column in the report results. The documentation says that this percentage “includes gaps” - however, I might be missing some key information on how it counts gaps occurring in the alignment comparison between the query sequence and the best hit template sequence -
- Does it count terminal gaps?
- Does it count a gap only once (comparable to ‘onegap’ in dist.seqs) or does it count each position of gaps (comparable to ‘eachgap’ in dist.seqs)?
Apologies if this information is already available somewhere and I have missed it.
Also wondering out of curiosity if anyone has any additional insight as to whether this column is trustworthy, or which of those options above are best for comparing environmental 16S sequences to databases. Personally I would choose to not count terminal gaps and count ‘onegap’ (each true indel is likely 1 evolutionary event), but I guess opinions can differ here.
And is there any way to approach this question with mothur besides the above? The other thing I was thinking of is pulling in all of the alignments of the best hits reported above with the environmental sequences, running dist.seqs and then pulling out just the relevant comparisons, but this approach seems super roundabout.