Unlock Hidden Patterns: A Deep Dive into Unsupervised Learning Techniques
This article introduces unsupervised learning, covering its motivation, Jensen's inequality, key clustering methods such as EM, k‑means, hierarchical clustering, evaluation metrics, and dimensionality‑reduction techniques like PCA and ICA, providing clear explanations and illustrative diagrams.
Unsupervised Learning
1.1 Introduction
Motivation
The goal of unsupervised learning is to discover hidden patterns in data that lack labels.
Jensen’s inequality : For a convex function f and a random variable X , the inequality f(E[X]) \le E[f(X)] holds, providing a fundamental bound used in many learning algorithms.
1.2 Clustering
1.2.1 Expectation‑Maximization (EM)
Latent variables
Latent variables are hidden or unobservable variables that make estimation problems more difficult; they are typically denoted by z . The most common setting with latent variables is shown in the diagram below.
EM iteratively builds a lower bound on the likelihood (E‑step) and then maximizes this bound (M‑step) to obtain maximum‑likelihood estimates of the parameters.
E‑step: Estimate the posterior probability of each data point belonging to each cluster.
M‑step: Use the posterior probabilities as weights to re‑estimate the cluster parameters.
1.2.2 k‑means clustering
Let x_i denote the i‑th point in a cluster and μ the cluster centroid.
Algorithm
The algorithm randomly initializes centroids and repeatedly assigns points to the nearest centroid, then updates centroids until convergence.
1.2.3 Hierarchical clustering
Algorithm
Hierarchical clustering builds nested clusters by repeatedly merging or splitting groups, producing a dendrogram that represents the hierarchy.
Types
Different linkage criteria optimize different objectives:
Ward linkage – minimizes within‑cluster variance.
Average linkage – minimizes average distance between clusters.
Complete linkage – minimizes maximum distance between clusters.
1.2.4 Clustering evaluation metrics
In unsupervised settings, evaluating model performance is challenging because ground‑truth labels are unavailable.
Silhouette coefficient
For a sample, let a be the average distance to other points in the same cluster and b the average distance to points in the nearest neighboring cluster; the silhouette coefficient is (b‑a)/max(a,b).
Calinski‑Harabasz index
Let K be the number of clusters, and let between‑cluster and within‑cluster dispersion matrices be defined as usual. The index is calculated as (trace(B)/(K‑1)) / (trace(W)/(N‑K)), where higher values indicate more distinct and compact clusters.
1.3 Dimension reduction
1.3.1 Principal Component Analysis (PCA)
PCA is a statistical technique that transforms a set of possibly correlated variables into a set of linearly uncorrelated variables called principal components via an orthogonal transformation.
Eigenvalues and eigenvectors
Given a matrix A , a vector v satisfying Av = λv is an eigenvector with eigenvalue λ . If A is symmetric, it can be diagonalized by an orthogonal matrix.
Note: The eigenvector associated with the largest eigenvalue is called the principal eigenvector.
Algorithm
Steps for PCA:
Standardize the data so that each feature has mean 0 and standard deviation 1.
Compute the covariance matrix and obtain its eigenvalues and eigenvectors.
Select the top k eigenvectors corresponding to the largest eigenvalues.
Project the data onto the selected eigenvectors, yielding a lower‑dimensional representation with maximal variance.
1.3.2 Independent Component Analysis (ICA)
ICA seeks to uncover latent source signals that have been linearly mixed. Assumptions We assume the observed data x is generated by mixing independent source vectors s through a non‑singular matrix A , i.e., x = As , where the components of s are statistically independent. Bell and Sejnowski ICA algorithm The algorithm finds an unmixing matrix W by maximizing the likelihood of the model. Using a chosen activation function, the log‑likelihood is derived, and a stochastic gradient ascent rule updates W for each training sample. Reference: MLEveryday https://github.com/MLEveryday/Machine-Learning-Cheatsheets
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.