Understanding Probabilistic Graphical Models: Bayesian & Markov Networks Explained

This article introduces probabilistic graphical models, explains the differences between Bayesian and Markov networks, derives their joint probability distributions, and details the principles and graphical representations of naive Bayes and maximum entropy models with illustrative equations and diagrams.

Hulu Beijing
Hulu Beijing
Hulu Beijing
Understanding Probabilistic Graphical Models: Bayesian & Markov Networks Explained

Scenario Description

Probabilistic Graphical Models (PGMs) are divided into two main types: Bayesian Networks, which use a directed graph structure, and Markov Networks, which use an undirected graph structure. PGMs model dependencies between entities; directed edges capture one‑way dependence while undirected edges capture mutual dependence. Common PGMs include naive Bayes, maximum entropy, hidden Markov models, conditional random fields, and topic models, and they are widely applied in machine‑learning tasks.

Figure 1: Bayesian and Markov networks
Figure 1: Bayesian and Markov networks

Problem Description

Write the joint probability distribution of the Bayesian network shown in Figure 1.

Write the joint probability distribution of the Markov network shown in Figure 1.

Explain the principle of the naive Bayes model and give its PGM representation.

Explain the principle of the maximum entropy model and give its PGM representation.

Answer and Analysis

1. Joint Distribution of the Bayesian Network

In the Bayesian network, given node A, nodes B and C are conditionally independent, so the joint distribution can be factorised as:

Factorisation after conditioning on A
Factorisation after conditioning on A

Similarly, given nodes B and C, nodes A and D become conditionally independent, leading to the full joint distribution:

Joint distribution expression
Joint distribution expression
Final joint probability formula
Final joint probability formula

2. Joint Distribution of the Markov Network

For a Markov network, the joint distribution is defined as a product of potential functions over cliques:

Markov network factorisation
Markov network factorisation

In the network of Figure 1 the maximal cliques are (A,B), (A,C), (B,D) and (C,D). The joint distribution can therefore be expressed as the product of potential functions over these cliques:

Potential functions for each clique
Potential functions for each clique
Combined joint distribution for the Markov network
Combined joint distribution for the Markov network

3. Naive Bayes Model

Naive Bayes predicts the probability P(y_i\mid x) that a sample belongs to class y_i. Assuming feature independence, the posterior can be written as a product of individual likelihoods:

Naive Bayes posterior formula
Naive Bayes posterior formula
Simplified Naive Bayes expression
Simplified Naive Bayes expression

The graphical representation is a simple directed graph where the class node points to all feature nodes, illustrating the conditional independence assumption.

4. Maximum Entropy Model

Entropy measures uncertainty; the maximum‑entropy principle selects the distribution with the largest entropy among those satisfying given constraints. For a discrete variable X with distribution P(X), entropy is defined as:

Entropy definition H(P) = -∑ P(x) log P(x)
Entropy definition H(P) = -∑ P(x) log P(x)

When X follows a uniform distribution, its entropy is maximal. The conditional entropy of P(Y\mid X) is defined similarly:

Conditional entropy definition
Conditional entropy definition

The maximum‑entropy model learns parameters w that maximise P_w(y\mid x). Its formulation resembles an exponential‑family Markov network where X and Y form a maximal clique:

Maximum entropy model expression
Maximum entropy model expression
Parameter learning objective for maximum entropy
Parameter learning objective for maximum entropy

From the PGM perspective, this corresponds to a Markov network whose potential function is an exponential of the weighted features, with the variables forming a maximal clique as illustrated below:

Clique representation in the maximum entropy model
Clique representation in the maximum entropy model
Naive Bayesbayesian networkmarkov networkmaximum entropyprobabilistic graphical models
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.