Understanding Probabilistic Graphical Models: Bayesian & Markov Networks Explained
This article introduces probabilistic graphical models, explains the differences between Bayesian and Markov networks, derives their joint probability distributions, and details the principles and graphical representations of naive Bayes and maximum entropy models with illustrative equations and diagrams.
Scenario Description
Probabilistic Graphical Models (PGMs) are divided into two main types: Bayesian Networks, which use a directed graph structure, and Markov Networks, which use an undirected graph structure. PGMs model dependencies between entities; directed edges capture one‑way dependence while undirected edges capture mutual dependence. Common PGMs include naive Bayes, maximum entropy, hidden Markov models, conditional random fields, and topic models, and they are widely applied in machine‑learning tasks.
Problem Description
Write the joint probability distribution of the Bayesian network shown in Figure 1.
Write the joint probability distribution of the Markov network shown in Figure 1.
Explain the principle of the naive Bayes model and give its PGM representation.
Explain the principle of the maximum entropy model and give its PGM representation.
Answer and Analysis
1. Joint Distribution of the Bayesian Network
In the Bayesian network, given node A, nodes B and C are conditionally independent, so the joint distribution can be factorised as:
Similarly, given nodes B and C, nodes A and D become conditionally independent, leading to the full joint distribution:
2. Joint Distribution of the Markov Network
For a Markov network, the joint distribution is defined as a product of potential functions over cliques:
In the network of Figure 1 the maximal cliques are (A,B), (A,C), (B,D) and (C,D). The joint distribution can therefore be expressed as the product of potential functions over these cliques:
3. Naive Bayes Model
Naive Bayes predicts the probability P(y_i\mid x) that a sample belongs to class y_i. Assuming feature independence, the posterior can be written as a product of individual likelihoods:
The graphical representation is a simple directed graph where the class node points to all feature nodes, illustrating the conditional independence assumption.
4. Maximum Entropy Model
Entropy measures uncertainty; the maximum‑entropy principle selects the distribution with the largest entropy among those satisfying given constraints. For a discrete variable X with distribution P(X), entropy is defined as:
When X follows a uniform distribution, its entropy is maximal. The conditional entropy of P(Y\mid X) is defined similarly:
The maximum‑entropy model learns parameters w that maximise P_w(y\mid x). Its formulation resembles an exponential‑family Markov network where X and Y form a maximal clique:
From the PGM perspective, this corresponds to a Markov network whose potential function is an exponential of the weighted features, with the variables forming a maximal clique as illustrated below:
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
