Cluster()--explanation for clustering methods


None of the links for any of the clustering methods have text. If I am not mistaken, I have read a page with explanations for the different methods. Is that something that can be put back on fairly easily?


You can just google it, here’s some information might be helpful:
Nearest neighbor:

Furthest neighbor:

However, I don’t know how the Average neighbor is calculated in mothur. Wish someone else can give an explanation of the algorithm of the average neighbor method used here.

Its aka the UPGMA algorithm.

Hi, Pat, I’ve run two clustering analysis using different tools: mothur and RDP tools
and even when I chose same method “furthest”, I got very different result, for example:
from mothur, I got:
cutoff cluster numbers
0.01 25455
0.02 15008
0.03 11038

from RDP, I got:
0.01 10955
0.02 6550
0.03 4502

I wonder how this difference comes out? I guess there might be some treatment mothur did with the abundance informaion(with name option), which RDP didn’t do that, since RDP does not have this option, right?

Yeah, I can’t speak to the RDP pipeline since I don’t use it because I’m not a fan of their data curation or alignment methods. The number after the cutoff is the size of the dominant OTU at that cutoff. So if you aren’t feeding back in the abundance of each sequence that will be different.


The number I listed should be the actual OTU numbers (I got this number from the 2nd column of .rabund file, instead of from the 2nd column of .sabund file, which is what you pointed out above.)

Anyway, that won’t affect the conclusion that mothur uses more information to do a better job.