"cluster" vs "classify.seq"

So I have a bit difficulty in understanding “cluster” and “classify.seq”.

“classify.seq”, let the user to assign their sequences to the taxonomy outline of their choices. so I assume basically “muthor” calculate the distance of each sequence to the taxonomy groups in the “.tax” files and decides what sequences should belong to which group.

In the other hand, there is “cluster”, which works on the distance matrix, and based on the distance matrix find out sequences that are close together and form clusters.

So in principle, either we have a set of OTUs and would like to see, what sequences belong to which, or we don’t have OTUs and would like to figure out potential OTUs.

here are my questions,

  1. Is this my understanding correct ?
  2. what is the measurement for calculating these distances ?
  3. is this distance matrix, symmetric ? looks like a correlation matrix ?

Thank you

  1. Is this my understanding correct ?

The taxonomic assignment uses the naive Bayesian aligner from Wang et al. that is used by the RDP. You should consult that paper for a description of the approach. Cluster assigns sequences to bins based on their similarity to the other sequences in the database.

  1. what is the measurement for calculating these distances ?
  1. is this distance matrix, symmetric ? looks like a correlation matrix ?

yes, the distances are symmetric. you can learn more about the distances at the dist.seqs wiki page.

Thank you.

I couldn’t find a direct link to the Wang method. I googled, :wang naive bayesian sequence and I got to http://www.ncbi.nlm.nih.gov/pubmed/17586664. Is that a correct reference ?

that’s the one!