Artificial Intelligence 3 min read

Master Systematic Clustering: From Distance Matrix to Multi-Level Groupings

Systematic clustering, a widely used hierarchical clustering technique, builds a dendrogram by iteratively merging the closest sample points based on a distance matrix, allowing analysts to visualize and select groupings at various distance thresholds, from a single cluster to each point as its own class.

Model Perspective
Model Perspective
Model Perspective
Master Systematic Clustering: From Distance Matrix to Multi-Level Groupings

1 Systematic Clustering Method

Systematic clustering is the most commonly used method in cluster analysis. Its advantage lies in presenting multiple classification levels from coarse to fine, typically visualized by a dendrogram.

Example:

Given seven points w1, w2, w3, w4, w5, w6, w7 on a plane (left figure), the clustering result can be shown by a dendrogram (right figure).

When the distance threshold is ... the seven points form a single cluster.

When the distance value is ... the data are divided into two clusters.

When the distance value is ... the data are divided into three clusters.

When the distance value is ... the data are divided into four clusters.

When the distance value is ... the data are divided into six clusters.

When the distance is less than ... the data are divided into seven clusters, each point forming its own cluster.

2 Steps

(1) Compute the pairwise distances among the n sample points and store them in a distance matrix.

(2) Initially construct n clusters, each containing a single sample point, with a platform height of zero.

(3) Merge the two closest clusters into a new cluster, using the distance between them as the platform height in the dendrogram.

(4) Recalculate the distances between the new cluster and all existing clusters; if only one cluster remains, proceed to step (5), otherwise return to step (3).

(5) Draw the dendrogram.

(6) Decide the desired number of clusters and assign points accordingly.

If the shortest‑distance (single‑link) method is used to measure inter‑cluster distances, it is called the nearest‑neighbor method; the longest‑distance (complete‑link) method works analogously.

Reference

ThomsonRen GitHub https://github.com/ThomsonRen/mathmodels

hierarchical clusteringMachine Learningclusteringdata analysisdistance matrix
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.