Artificial Intelligence 30 min read

Named Entity Recognition in O2O Search: Background, Technical Choices, and Practical Practices

Meituan’s O2O search relies on a hybrid NER system that combines high‑precision domain dictionaries with BERT‑based models scored by a CRF, built from multi‑source offline mining, accelerated via operator fusion, batching and mixed‑precision, and further enhanced by lattice‑LSTM, knowledge‑infused stages and weak‑supervision, delivering millisecond‑level latency and over‑90% recall.

Meituan Technology Team

Jul 23, 2020

Named Entity Recognition in O2O Search: Background, Technical Choices, and Practical Practices

Named Entity Recognition (NER) is a fundamental tool for information extraction, question answering, syntactic analysis, machine translation, and Semantic Web metadata annotation. In the context of Meituan’s O2O (Online‑to‑Offline) search, NER serves as the core signal for Deep Query Understanding (DQU), directly affecting search recall, user intent identification, and entity linking.

1. Background

In O2O search, merchants (POI) are described by multiple loosely related text fields such as name, address, and category. Performing a naïve full‑field inverted index leads to massive false recalls. A structured recall strategy is adopted: only queries that contain a recognized entity (e.g., a merchant name) are matched against the corresponding field, dramatically improving relevance.

The O2O NER task has three distinctive challenges:

Rapid growth of new entities and slang (e.g., “牵肠挂肚”, “吸猫”).

Strong domain relevance – the same phrase may refer to a POI, a product, or a generic concept.

Stringent latency requirements – NER must finish within milliseconds.

2. Technical Selection

The production solution follows a "dictionary matching + model prediction" framework.

Why dictionary matching?

Short, head‑traffic queries (merchant name, category, address) can be covered with >90% accuracy using a lightweight dictionary.

Domain‑specific dictionaries guarantee high precision for business‑specific entities.

New business scenarios can be supported simply by adding new term lists.

Dictionary lookup is extremely fast, satisfying the most latency‑sensitive cases.

Why model prediction?

Long‑tail and out‑of‑vocabulary (OOV) queries cannot be covered by a static dictionary.

Disambiguation (e.g., “黄鹤楼” could be a scenic spot, a merchant, or a cigarette brand) requires contextual understanding.

The two outputs are merged by a CRF‑based scorer: when the dictionary yields no result or its score is significantly lower than the model’s, the model output is used; otherwise the dictionary result is kept.

3. Offline Entity Dictionary Construction

Offline mining leverages multi‑source data: structured POI information, encyclopedia entries, search logs, and user‑generated content (UGC). A three‑step pipeline is employed:

Candidate sequence extraction : frequent n‑grams are harvested from UGC.

Remote‑supervised labeling : candidate n‑grams intersected with an existing entity dictionary become positive examples; random sampling provides negatives. Four statistical features (frequency, tightness, informativeness, completeness) are computed.

Deep semantic scoring : a BERT‑based phrase quality scorer is trained on the weakly labeled data, optionally refined with search‑log signals and bootstrapping.

The resulting dictionary achieves ~92% online NER recall for head‑ and mid‑traffic queries.

4. Online Dictionary Matching

Initial matching uses bidirectional maximum matching on the query, followed by a CRF‑based segmentation model to correct boundary errors and apply pattern‑based repairs. This two‑stage approach improves both precision and recall for short queries.

5. Model Online Prediction

The backbone model is BERT (and a BERT+LR cascade). To meet latency constraints, three acceleration techniques are applied:

Operator fusion (e.g., FasterTransformer) reduces kernel launch overhead, yielding 1.4×‑2× speed‑up.

Batching merges multiple requests into a single GPU batch, achieving sub‑6 ms latency at 1300 QPS.

Mixed‑precision (FP16/FP32) accelerates inference while preserving numerical stability.

Model distillation is also employed: a lightweight student network approximates the BERT teacher, preserving accuracy while cutting inference cost by an order of magnitude.

6. Knowledge‑Enhanced NER

Two methods inject external knowledge:

Lattice‑LSTM with search‑log features : phrase vectors derived from query‑document matching are fed into a lattice LSTM, boosting accuracy by ~0.5%.

Two‑stage NER with entity dictionary : the first stage (BERT) predicts boundaries; the second stage (IDCNN) incorporates dictionary embeddings for label classification, yielding a 1% gain over pure BERT‑NER.

7. Weak Supervision

To alleviate the scarcity of manually annotated data, a weak‑supervision pipeline is built:

Train an initial BERT model on a small labeled set (Model A).

Use Model A to label a massive entity dictionary.

Correct the predictions by aligning them with high‑precision dictionary entries, selecting the most probable correction via probability‑ratio scoring.

Fine‑tune Model A on the combined strong + weak data, which consistently outperforms training on strong data alone.

8. Summary and Outlook

The article presents the characteristics of O2O search NER, the dictionary‑plus‑model architecture, and practical techniques such as model distillation, operator fusion, mixed‑precision, knowledge‑enhanced LSTM, and weak supervision. Future work includes handling unseen entities, further disambiguation, and deeper domain adaptation, inviting collaboration from the research community.

References

[1] Automated Phrase Mining from Massive Text Corpora. 2018. [2] Learning Named Entity Tagger using Domain‑Specific Dictionary. 2018. [3] Bidirectional Encoder Representations from Transformers. 2018. [4] https://www.jiqizhixin.com/articles/2018-12-30 [5] https://naacl2019.org/blog/best-papers/ [6] Hinton et al., Distilling the Knowledge in a Neural Network. 2015. [7] Yew Ken Chia et al., Transformer to CNN: Label‑scarce distillation for efficient text classification. 2018. [8] K‑BERT: Enabling Language Representation with Knowledge Graph. 2019. [9] Enhanced Language Representation with Informative Entities. 2019. [10] Chinese NER Using Lattice LSTM. 2018.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Weak Supervision model distillation NER Dictionary Matching Knowledge-Enhanced O2O Search

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.