Artificial Intelligence 42 min read

Query Rewriting in Meituan Search: Iterative Directions and System Design

The paper presents Meituan’s end‑to‑end query‑rewriting system—combining large‑scale corpus mining, BERT‑based semantic discrimination, and multiple online services such as dictionary, SMT/NMT, and vector retrieval—to bridge semantic gaps, boost recall, now serving roughly 73% of search traffic at up to 60 k QPS, with future work targeting deeper vector methods and generative rewriting.

Meituan Technology Team

Feb 17, 2022

Query Rewriting in Meituan Search: Iterative Directions and System Design

This article introduces query rewriting—a technique that expands user queries with better expressions to improve recall in text‑based Boolean retrieval systems—within Meituan's search scenario. It outlines the motivation, challenges, and the overall impact of query rewriting on search experience.

1. Introduction explains that mismatches between user queries and document texts cause severe recall loss. Query rewriting (also called Query Expansion) generates alternative terms that are semantically related to the original query, helping users retrieve more relevant merchants, products, and services.

2. Background and Challenges describes four types of semantic gaps in Meituan search: semantic expansion (synonyms, hyponyms, case variations), user‑merchant expression gaps, scenario expansion (e.g., “pick strawberries” → “strawberry garden”), and other recall issues such as missing characters or time‑sensitive concepts. It also highlights the unique difficulty of incorporating a third “regional” constraint, the diversity of local services, and the need for high‑precision, low‑latency rewriting across many business lines.

3. Technical Choices covers the end‑to‑end framework:

3.1 Raw Corpus Mining – Methods include search‑log co‑click mining, session‑based mining, word alignment, merchant SEO extraction, graph‑based mining (SimRank++, GNN), and semantic vector mining (word2vec, Doc2Vec, session‑based embeddings). FastText, LSH, DSSM, and XGBoost are used for candidate filtering.

3.2 Semantic Discrimination Models – BERT‑based sentence‑pair classifiers (MT‑BERT, NMT‑BERT co‑training) are fine‑tuned with semi‑supervised and hard‑negative data to achieve >94% accuracy and mitigate semantic drift.

3.3 Online Services – Four solutions: high‑precision dictionary rewriting, statistical machine translation (SMT) + XGBoost ranking, neural machine translation (NMT) with reinforcement learning, and vector‑based retrieval using dual‑tower models and ANN (Faiss/Antler). Each module is described with architecture diagrams, training pipelines, and deployment details.

4. Summary and Outlook reports that query rewriting now accounts for ~73% of Meituan App search traffic and ~67% of Dianping App search traffic, handling up to 60 k QPS. Future work includes deeper vector retrieval research, richer semantic discrimination (including multimodal signals), advanced generative rewriting (e.g., SeqGAN), and finer‑grained lexical relation graphs for the life‑service domain.

5. Authors : Yang Jian, Zong Yu, Xie Rui, Wu Wei (Meituan Search & NLP team).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Meituan Query Rewriting

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.