I am cultivating new organisms of Archaea and some of them seem to be the same. I would like to calculate if they’re 16S rRNA gene sequences have more than 97% of identity in order to make a first conclusion of these microorganisms been the same or not.
I know how to cluster sequences by their similarities according to cutoffs pre established. But I was wondering if there is one way to estimate the percentage of similarity between two sequences of 16S rRNA gene with mothur.
dist.seqs will give you a distance matrix (1-similarity) alternatively, running classify.seqs with an alignment and the distance method should give you distances to the closest match in the database.
I will try that also. But I was wondering a way to calculate if two sequences of mine have more than 97% of identity. I need the number of similar nucleotides, not just the name of the closest match in a database.
If you use the distance based approach with knn=1 you’ll get the percent id.