How Poisson Hidden Markov Models Enable Count‑Based Time‑Series Regression
This article explains how mixing a Poisson process with a discrete k‑state hidden Markov model creates a Poisson HMM that captures autocorrelation in integer‑valued time‑series, detailing the model formulation, prediction via expectation over states, and parameter estimation using MLE or EM.
The article introduces a Poisson Hidden Markov Model (Poisson HMM) that combines two stochastic processes—a Poisson process for count data and a discrete k‑state Markov chain—to model integer‑valued time‑series such as daily click counts.
Poisson component
Observed values y_t are expressed as the sum of a predicted mean μ̂_t and a residual ε_t, where ε_t follows a zero‑mean normal distribution N(0,σ²). The count variable y_t is assumed to follow a Poisson distribution with mean μ_t, allowing each time step its own mean.
The regression matrix X (size n × (m+1)) includes an intercept column, and the fitted coefficient vector β̂ (size (m+1) × 1) yields the exponential linear predictor:
The dot product x_t·β̂ is exponentiated to guarantee a non‑negative mean, satisfying the key requirement for integer count modeling.
Hidden Markov component
A k‑state discrete Markov chain governs the hidden state j ∈ {1,…,k}. The Poisson mean is indexed by the hidden state, giving state‑specific means μ̂_t_j:
The transition matrix P (elements p_ij) and the state probability vector π_t evolve the hidden state over time. The probability of being in state j at time t is π_tj.
Because the true state is unknown, the model predicts a single mean μ̂_t by taking the expectation over all possible states:
Training and parameter estimation
Training the Poisson HMM involves estimating the coefficient matrix β̂_s and the transition matrix P. Maximum likelihood estimation (MLE) or the Expectation‑Maximization (EM) algorithm maximizes the joint likelihood of the observed series y:
The log‑likelihood is differentiated with respect to each transition probability p_ij and each state‑specific coefficient β̂_q_j. Setting the derivatives to zero yields a system of equations solved by optimization methods such as Newton‑Raphson, Nelder‑Mead, or Powell’s algorithm.
During optimization, each row of P must sum to 1 and each element stay within [0,1]. To enforce these constraints, a proxy matrix Q (size k × k) is introduced, allowing unconstrained optimization of q_ij and then normalizing to obtain valid probabilities:
References
Cameron, A., & Trivedi, P. (2013). Regression Analysis of Count Data (2nd ed.). Cambridge University Press.
James D. Hamilton, Time Series Analysis , Princeton University Press, 2020.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
