Artificial Intelligence 10 min read

Understanding Probabilistic Graphical Models: Bayesian & Markov Networks Explained

This article introduces probabilistic graphical models, explains the differences between Bayesian and Markov networks, derives their joint probability distributions, and details the principles and graphical representations of naive Bayes and maximum entropy models with illustrative equations and diagrams.

Hulu Beijing

Mar 1, 2018

Understanding Probabilistic Graphical Models: Bayesian & Markov Networks Explained

Scenario Description

Probabilistic Graphical Models (PGMs) are divided into two main types: Bayesian Networks, which use a directed graph structure, and Markov Networks, which use an undirected graph structure. PGMs model dependencies between entities; directed edges capture one‑way dependence while undirected edges capture mutual dependence. Common PGMs include naive Bayes, maximum entropy, hidden Markov models, conditional random fields, and topic models, and they are widely applied in machine‑learning tasks.

Problem Description

Write the joint probability distribution of the Bayesian network shown in Figure 1.

Write the joint probability distribution of the Markov network shown in Figure 1.

Explain the principle of the naive Bayes model and give its PGM representation.

Explain the principle of the maximum entropy model and give its PGM representation.

Answer and Analysis

1. Joint Distribution of the Bayesian Network

In the Bayesian network, given node A, nodes B and C are conditionally independent, so the joint distribution can be factorised as:

Similarly, given nodes B and C, nodes A and D become conditionally independent, leading to the full joint distribution:

2. Joint Distribution of the Markov Network

For a Markov network, the joint distribution is defined as a product of potential functions over cliques:

In the network of Figure 1 the maximal cliques are (A,B), (A,C), (B,D) and (C,D). The joint distribution can therefore be expressed as the product of potential functions over these cliques:

Combined joint distribution for the Markov network

3. Naive Bayes Model

Naive Bayes predicts the probability P(y_i\mid x) that a sample belongs to class y_i. Assuming feature independence, the posterior can be written as a product of individual likelihoods:

The graphical representation is a simple directed graph where the class node points to all feature nodes, illustrating the conditional independence assumption.

4. Maximum Entropy Model

Entropy measures uncertainty; the maximum‑entropy principle selects the distribution with the largest entropy among those satisfying given constraints. For a discrete variable X with distribution P(X), entropy is defined as:

Entropy definition H(P) = -∑ P(x) log P(x)

When X follows a uniform distribution, its entropy is maximal. The conditional entropy of P(Y\mid X) is defined similarly:

The maximum‑entropy model learns parameters w that maximise P_w(y\mid x). Its formulation resembles an exponential‑family Markov network where X and Y form a maximal clique:

Parameter learning objective for maximum entropy

From the PGM perspective, this corresponds to a Markov network whose potential function is an exponential of the weighted features, with the variables forming a maximal clique as illustrated below: