question on classify.seqs using knn algorithm

pad · February 18, 2013, 7:52pm

Hello all,

I was wondering how distances are calculated with the “search=distance” method for finding k-nearest neighbors; are reported values based on pair-wise alignments after excluding terminal gaps (assuming template sequences are longer than the queries)?

Thanks much.
pad

westcott · February 18, 2013, 8:57pm

The distance reported is the distance calculated between the query sequence and its closest match in the template. For example:

SequenceA ATGCATGCATGC
SequenceB ACGC—CATCC

Would have two mismatches and one gap. The length of the shorter sequence is 10 nt, since the gap is considered as a single position. Therefore the distance would be 3/10 or 0.30. This is the distance calculating method employed by Sogin et al. (1995). The logic behind this type of penalty is that a gap represents an insertion and it is likely that a gap of any length represents a single insertion.

Topic		Replies	Views
A missing reference for the one-gap method Feature requests	3	465	December 4, 2019
classify.seqs--distance Commands in mothur	2	2601	August 15, 2011
Distance to nearest neighbor (classify.seqs) Feature requests	3	6511	July 23, 2010
Method used in the command dist.seqs Commands in mothur	4	1823	June 19, 2015
Distance computation algorithms Commands in mothur	1	28318	December 1, 2009

question on classify.seqs using knn algorithm

Related topics