I’m interested in taxonomic classification of sequences. RDP is fine if you want the genus-level but sub-genus classification is harder (and a whole subject of its own). That said, RDP has a tool called seqmatch that gives you a list of candidate templates that match your query. I don’t really like the output of Seqmatch though and I’m trying to figure out how to get MOTHUR to do something similar. The align.seqs command is probably almost doing what I need, selecting the best template and scoring the alignment, but I have no idea if the best template was really the best or just one of several equally good templates. Furthermore, I sometimes have low quality positions in my sequences that I’d like to mask (with Ns for instance) so they do not contribute to the scoring. Has anyone used MOTHUR in such a way?
The closest thing we probably have is the k-Nearest Neighbors option in classify.seqs. You could do it in an alignment-based approach to get the most accurate match to what you want. Alternatively, you could use kmers. If I remember right, seqmatch is kmer based, right? By changing the value of “k” you can select how many sequences you want for the consensus. The distance-based approach will output the name of the closest sequence and the similarity between it and your sequence. To mask out the N’s, you could always use filter.seqs(trump=N), but that would remove that column from every sequence. Let us know what you think…