I know you get questions about the cutoff changing with average linkage a lot, but the explanation below still doesn’t make sense to me.
“Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data.”
Average linkage with a cutoff of 0.05 is saying that the mean distance between all sequences in a cluster A and all sequences in cluster B must be higher than 0.05 for them to be considered two different clusters, right? And conversely, all sequences in cluster A are, on average, less than 0.05 apart.
So how does that connect to the explanation about the cutoff changing? If you have two cells/sequences X and Y that are 0.03 apart, they are grouped into cluster A. Then you have another sequence Z that is 0.03 from X and 0.06 from Y. In furthest linkage, Z would go into another cluster B. But with average, Z would go into cluster A because it is averaging 0.045 from both X and Y, and that is below the 0.05 cutoff. That makes sense. But then according to the explanation, the cutoff would then change from 0.05 to 0.03 because the difference between X and Z - so clusters A and B now only have to be over 0.03 apart on average to be considered different? So then shouldn’t Z go into cluster B if the cutoff is lowered, because it is then no longer on average 0.03 from both X and Y? It’s circular logic. Help please.
Thanks in advance,
-Ashley