How does align.seq's SimBtwnQuery&Template handle gaps?

acc · April 22, 2019, 8:49pm

Hello all -
I am currently using mothur to compare a set of environmental 16S sequences to database reference sequences. I’ve come to the conclusion that different methods can make quite important differences here. I think that aligning to a proper reference alignment (e.g., one built w/ secondary structure information) is critical to this.
I’ve run mothur’s align.seqs and am really interested in the SimBtwnQuery&Template column in the report results. The documentation says that this percentage “includes gaps” - however, I might be missing some key information on how it counts gaps occurring in the alignment comparison between the query sequence and the best hit template sequence -

Does it count terminal gaps?
Does it count a gap only once (comparable to ‘onegap’ in dist.seqs) or does it count each position of gaps (comparable to ‘eachgap’ in dist.seqs)?

Apologies if this information is already available somewhere and I have missed it.

Also wondering out of curiosity if anyone has any additional insight as to whether this column is trustworthy, or which of those options above are best for comparing environmental 16S sequences to databases. Personally I would choose to not count terminal gaps and count ‘onegap’ (each true indel is likely 1 evolutionary event), but I guess opinions can differ here.

And is there any way to approach this question with mothur besides the above? The other thing I was thinking of is pulling in all of the alignments of the best hits reported above with the environmental sequences, running dist.seqs and then pulling out just the relevant comparisons, but this approach seems super roundabout.

Thanks!

acc · April 23, 2019, 2:05am

OK, I realized I could quickly test this questions with some simulated data:

No, terminal gaps are not counted in this similarity.
Gaps are counted according to their length (IE, the ‘eachgap’ method explained in dist.seqs).

I’ll keep this information out there in case someone else in the future needs it, or if there is anything else worth discussing from the above with the community.
Thanks!
Alex

pschloss · April 25, 2019, 12:37pm

This is also described with examples at the dist.seqs wiki page.

Pat

system · May 5, 2019, 12:37pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
too many gaps in alignment with template Commands in mothur	1	1017	January 28, 2016
Align.seqs SimBtwnQuery&Template Commands in mothur	1	450	November 6, 2018
alignment Commands in mothur	3	1421	May 31, 2016
23s rRNA database alignment Commands in mothur	2	1919	June 1, 2015
Filtered alignment, 0 mothur bugs	10	803	April 30, 2020

How does align.seq's SimBtwnQuery&Template handle gaps?

Related topics