Average linkage cutoff

I know you get questions about the cutoff changing with average linkage a lot, but the explanation below still doesn’t make sense to me.

“Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data.”

Average linkage with a cutoff of 0.05 is saying that the mean distance between all sequences in a cluster A and all sequences in cluster B must be higher than 0.05 for them to be considered two different clusters, right? And conversely, all sequences in cluster A are, on average, less than 0.05 apart.

So how does that connect to the explanation about the cutoff changing? If you have two cells/sequences X and Y that are 0.03 apart, they are grouped into cluster A. Then you have another sequence Z that is 0.03 from X and 0.06 from Y. In furthest linkage, Z would go into another cluster B. But with average, Z would go into cluster A because it is averaging 0.045 from both X and Y, and that is below the 0.05 cutoff. That makes sense. But then according to the explanation, the cutoff would then change from 0.05 to 0.03 because the difference between X and Z - so clusters A and B now only have to be over 0.03 apart on average to be considered different? So then shouldn’t Z go into cluster B if the cutoff is lowered, because it is then no longer on average 0.03 from both X and Y? It’s circular logic. Help please.

Thanks in advance,
-Ashley

So how does that connect to the explanation about the cutoff changing? If you have two cells/sequences X and Y that are 0.03 apart, they are grouped into cluster A. Then you have another sequence Z that is 0.03 from X and 0.06 from Y. In furthest linkage, Z would go into another cluster B. But with average, Z would go into cluster A because it is averaging 0.045 from both X and Y, and that is below the 0.05 cutoff. That makes sense. But then according to the explanation, the cutoff would then change from 0.05 to 0.03 because the difference between X and Z - so clusters A and B now only have to be over 0.03 apart on average to be considered different? So then shouldn’t Z go into cluster B if the cutoff is lowered, because it is then no longer on average 0.03 from both X and Y? It’s circular logic. Help please.

The problem is that if you apply the 0.05 cutoff, then you don’t know that Z and Y are 0.06 apart. That’s why you have to reduce the cutoff to 0.03, because you just don’t know what the cutoff should be. It might be easier to draw up a simple distance matrix and do the clustering by hand, with and without a cutoff. I’ve done this before with the Amazon dataset, but I can’t seem to find it currently.