Artificial Intelligence 13 min read

Integrating Lexical Knowledge and Handling Nested Entities in Chinese Named Entity Recognition

This article reviews recent advances in Chinese NER, examining lexical‑knowledge integration methods such as Lattice LSTM, FLAT, and graph‑based networks, and discusses approaches for nested entity recognition including Pyramid, MRC‑based frameworks, TPLinker and span‑based models, highlighting their strengths and trade‑offs.

DataFunTalk
DataFunTalk
DataFunTalk
Integrating Lexical Knowledge and Handling Nested Entities in Chinese Named Entity Recognition

Benefiting from BERT, encoder‑CRF architectures achieve solid performance on Chinese NER, but this work focuses on two research directions—lexical knowledge integration and nested entity handling—rather than pure metric improvement.

Integrating Lexical Knowledge

Chinese NER Using Lattice LSTM

Lattice LSTM incorporates word‑level cells into a character‑based LSTM, allowing lexical information to improve entity boundary detection; it reduces one output gate compared to a standard char‑level LSTM. The original paper’s formulas (10‑15) describe the WordLSTMCell.

FLAT: Chinese NER Using Flat‑Lattice Transformer

FLAT addresses the speed limitation of Lattice LSTM and its incompatibility with BERT by concatenating lexical tokens to the input and modifying the position embedding to use span‑based (start‑end) encodings, plus relative position matrices (head‑head, head‑tail, tail‑head, tail‑tail) for richer word‑span interactions.

FLAT can be batch‑parallel, leverages BERT, and achieves strong results on standard Chinese NER benchmarks, at the cost of higher GPU memory usage.

Leverage Lexical Knowledge for Chinese NER via Collaborative Graph Network

Graph‑based CGN overcomes Lattice LSTM’s information loss by constructing three graphs: C‑graph (word‑character containing), T‑graph (word‑character transition), and L‑graph (word‑character lattice). These capture word boundaries, contextual adjacency, and latent lexical relations; a fusion layer combines their features before CRF decoding, offering speed gains over FLAT.

Nested Entity Problem

Pyramid: A Layered Model for Nested Named Entity Recognition

Pyramid uses a multi‑layer CNN hierarchy where each layer predicts entities of a specific length L, enabling detection of overlapping spans without generating spurious entities at incorrect layers; a reverse pyramid also feeds higher‑level span information back to lower layers.

A Unified MRC Framework for Named Entity Recognition

This approach reformulates NER as a machine‑reading‑comprehension task: the model predicts start and end positions for each entity type, allowing spans to overlap and thus handling nesting. Multiple entity types are queried separately, and a matching module filters valid span‑type pairs.

TPLinker: Single‑stage Joint Extraction of Entities and Relations Through Token Pair Linking

TPLinker treats every possible token pair as a potential entity (or head/tail of a relation) and classifies them in a triangular N×N matrix, reducing redundancy while supporting nested entities; separate classifiers handle ordinary entities, head entities, and tail entities for each relation type.

Span‑based Joint Entity and Relation Extraction with Transformer Pre‑training (spERT)

spERT enumerates all possible n‑gram spans after BERT encoding, adds width embeddings to the span classifier, and filters non‑entities; relation classification concatenates span representations and applies a sigmoid per relation type.

Overall, lexical‑knowledge fusion (Lattice LSTM, FLAT, CGN) and span‑based nested entity strategies (Pyramid, MRC, TPLinker, spERT) represent complementary directions for improving Chinese NER, offering insights that may transfer to other NLP tasks.

transformerMRCgraph networkChinese NERLattice LSTMlexical knowledgenested entities
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.