Artificial Intelligence 8 min read

Modeling Chinese Word Segmentation with Hidden Markov Models

This article explains how Hidden Markov Models can be used to model Chinese word segmentation, covering the underlying Markov process, model parameters, basic HMM problems, and both supervised and unsupervised training methods.

Hulu Beijing

Feb 6, 2018

Modeling Chinese Word Segmentation with Hidden Markov Models

Scene Description

Sequence labeling assigns a label to each element in a sequence and is applied in many NLP tasks such as Chinese word segmentation, POS tagging, semantic role labeling, NER, and speech recognition.

Problem Description

Describe how to model Chinese word segmentation with a Hidden Markov Model (HMM) and how to train the model given a corpus.

Answer and Analysis

Background: An HMM is a classic generative model that assumes a hidden Markov chain generates observable sequences. It is widely used for sequence labeling in NLP and speech.

In a Markov process, the state at time t_n depends only on the previous state t_{n-1}. Extending this, an HMM introduces hidden states x_i that are not directly observable; each hidden state emits an observable output y_i. The model parameters include transition probabilities between hidden states, emission probabilities from hidden to observable states, the state space of x, the observation space of y, and the initial state distribution.

Example: imagine three gourds (hidden states) each containing good or bad medicine (observations). We randomly pick a gourd, draw a medicine, record its type, then possibly transition to another gourd. The hidden state sequence is the gourd identity; the observation sequence is the medicine type.

Using an HMM, the hidden state space is {gourd1, gourd2, gourd3} and the observation space is {good, bad}. The initial distribution reflects the random first pick, transition probabilities model moving between gourds, and emission probabilities model the chance of drawing good or bad medicine from each gourd.

HMMs involve three fundamental problems:

Probability computation: given model parameters, compute the probability of an observation sequence Y (solved by forward‑backward algorithm).

Decoding: given parameters and Y, find the most likely hidden state sequence X (solved by Viterbi algorithm).

Learning: given Y, estimate parameters that maximize its probability (solved by Baum‑Welch/EM algorithm).

Applying this to Chinese word segmentation, each character is an observation. We label characters with B (begin), E (end), M (middle), S (single). The hidden state space is {B, E, M, S}. Transition constraints can be encoded (e.g., B/M can be followed only by M/E, S/E only by B/S). The observation space consists of all Chinese characters in the corpus.

Training can be supervised—using a labeled corpus to count transitions and emissions for maximum‑likelihood estimates—or unsupervised—applying Baum‑Welch to learn parameters from raw text.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning natural language processing Chinese Word Segmentation sequence labeling Hidden Markov Model

Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.