Artificial Intelligence 3 min read

Master K-means Clustering: How the Algorithm Finds Compact Groups

K-means is a classic distance‑based clustering algorithm that iteratively partitions data into k compact, well‑separated groups by minimizing the sum of squared errors, using random centroid initialization and heuristic updates until convergence, making it a fundamental tool in AI and data analysis.

Model Perspective

Jun 4, 2022

Master K-means Clustering: How the Algorithm Finds Compact Groups

1 K-means Algorithm

K-means is a simple yet classic distance‑based clustering algorithm that uses distance as a similarity metric, assuming that the closer two objects are, the more similar they are. The algorithm aims to produce compact and independent clusters as the final goal.

It iteratively searches for a partition of the data into k clusters such that the total error, measured by the sum of squared distances between samples and their cluster centroids, is minimized.

The k clusters have the following characteristics: each cluster is as compact as possible, and different clusters are as far apart as possible.

The basis of K‑means is the minimum squared error criterion. If the data is expressed mathematically, the objective is to minimize the sum of squared errors between each point and its assigned cluster centroid (the mean vector of the cluster, also called the centroid).

Directly finding the global minimum is NP‑hard, so a heuristic iterative method is used.

2 Steps

Step 1: Randomly select k cluster centroids.

Step 2: Repeat the following until convergence:

For each sample i, assign it to the nearest cluster.

Step 3: For each cluster, recompute its centroid.

Convergence is reached when the distance between the newly computed centroids and the previous ones falls below a predefined threshold, indicating that centroid positions have stabilized.

Reference

ThomsonRen GitHub https://github.com/ThomsonRen/mathmodels

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm machine learning clustering Unsupervised Learning K-Means

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.