Artificial Intelligence 12 min read

Mastering the EM Algorithm: Theory, Math, and Python Implementation

This article explains the Expectation‑Maximization (EM) algorithm, detailing its iterative E‑step and M‑step processes, mathematical formulation, and practical Python implementation for estimating parameters of mixed linear regression models, while highlighting convergence considerations and common pitfalls.

Model Perspective
Model Perspective
Model Perspective
Mastering the EM Algorithm: Theory, Math, and Python Implementation

EM Algorithm

EM algorithm is an iterative optimization method for parameter estimation in probabilistic models with latent variables. It alternates between two steps to iteratively solve maximum likelihood or maximum a posteriori estimation:

E step (Expectation): Given the current parameter estimates, compute the posterior probability of each latent variable (i.e., the distribution of the latent variable conditioned on the observed data and current parameters), also called computing the expectation.

M step (Maximization): Maximize the expected log‑likelihood weighted by the posterior probabilities obtained in the E step to obtain new parameter estimates.

Repeat these two steps until convergence, i.e., the parameter estimates no longer change, yielding the final estimates.

The EM algorithm is widely used for parameter estimation in Gaussian mixture models, hidden Markov models, factor analysis, latent semantic analysis, and other probabilistic models.

Mathematical Formulation

Assume we have a set of observed data {x_i}, each a d‑dimensional vector, and we wish to describe its distribution with a probabilistic model parameterized by θ. The likelihood function is L(θ), and our goal is to find the maximum‑likelihood estimate of θ.

When the likelihood is complex or lacks a closed‑form solution, the EM algorithm can be employed.

The algorithm consists of an E step and an M step. In the E step we compute the posterior distribution of the latent variables z given the observed data X and current parameters θ. In the M step we update the parameter estimates using this posterior distribution.

By repeatedly performing the E and M steps, the parameter estimates are iteratively refined until they converge to the maximum‑likelihood estimate.

Python Implementation

We apply the EM algorithm to the following problem: a set of input‑output data is collected, but it is unknown which of two sub‑models each sample originates from.

Model 1: y = k1·x + b1 + ε1, where ε1 ~ N(0, σ_v²). Model 2: y = k2·x + b2 + ε2, where ε2 ~ N(0, σ_w²). The goal is to identify the parameters (k1, b1, k2, b2) and the mixing probability p1 using EM.

The algorithm proceeds as follows:

Initialize model parameters (including the mixing probability p1).

E step: For each data point, compute the probability that it comes from Model 1 and Model 2.

M step: Update the model parameters using the responsibilities computed in the E step.

Repeat the E and M steps until convergence or a maximum number of iterations is reached.

The EM algorithm outputs the estimated parameters for both models and the mixing probability. Note that EM is sensitive to initialization and may converge to a local optimum; multiple runs and variance estimation may be required.

<code>import numpy as np
from scipy.stats import norm

# Define model 1 and model 2
def model1(x, k, b, sigma):
    return k * x + b + np.random.normal(scale=sigma, size=x.shape)

def model2(x, k, b, sigma):
    return k * x + b + np.random.normal(scale=sigma, size=x.shape)

# Define EM algorithm function
def em_algorithm(x, y, k1, b1, k2, b2, p1, sigma_v, sigma_w, max_iter=100):
    n = len(x)
    w = np.zeros((n, 2))  # responsibilities
    for iter in range(max_iter):
        # E step
        pdf1 = norm.pdf(y, loc=model1(x, k1, b1, sigma_v), scale=sigma_v)
        pdf2 = norm.pdf(y, loc=model2(x, k2, b2, sigma_w), scale=sigma_w)
        p1 = pdf1 / (pdf1 + pdf2)
        w[:, 0] = p1
        w[:, 1] = 1 - p1

        # M step
        k1 = np.sum(w[:, 0] * x * y) / np.sum(w[:, 0] * x ** 2)
        b1 = np.sum(w[:, 0] * y - k1 * w[:, 0] * x) / np.sum(w[:, 0])
        k2 = np.sum(w[:, 1] * x * y) / np.sum(w[:, 1] * x ** 2)
        b2 = np.sum(w[:, 1] * y - k2 * w[:, 1] * x) / np.sum(w[:, 1])
        sigma_v = np.sqrt(np.sum(w[:, 0] * (y - k1 * x - b1) ** 2) / np.sum(w[:, 0]))
        sigma_w = np.sqrt(np.sum(w[:, 1] * (y - k2 * x - b2) ** 2) / np.sum(w[:, 1]))
    return k1, b1, k2, b2, p1, sigma_v, sigma_w

# Initial values
k1 = 0.5
b1 = 1
k2 = 1.5
b2 = 2
p1 = 0.5
sigma_v = 0.1
sigma_w = 0.2
n = 100
x = np.random.uniform(low=0, high=10, size=n)
y = np.zeros(n)
for i in range(n):
    if np.random.random() < 0.5:
        y[i] = 0.5 * x[i] + 1 + np.random.normal(scale=0.1)
    else:
        y[i] = 1.5 * x[i] + 2 + np.random.normal(scale=0.2)

# Run EM algorithm
k1, b1, k2, b2, p1, sigma_v, sigma_w = em_algorithm(x, y, k1, b1, k2, b2, p1, sigma_v, sigma_w, max_iter=10000)
</code>

Sample output (truncated):

<code>(0.6444, 0.2708, 1.8291, 0.4288, array([...]), 0.5063, 0.8973)</code>

References

Flashman's Right Hand: EM Algorithm Theory and Python Implementation, Zhihu, https://zhuanlan.zhihu.com/p/302515142

Machine LearningpythonEM algorithmparameter estimationexpectation maximization
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.