Artificial Intelligence 30 min read

Pretraining Techniques for Search Advertising Relevance at Meituan

Meituan improves search‑ad relevance by applying pre‑trained BERT models enhanced with data‑augmented samples, multi‑task learning, keyword extraction and two‑stage knowledge distillation, producing a lightweight distilled model that, when fused with traditional relevance signals, boosts CTR, lowers Badcase@5 and raises NDCG while preserving revenue.

Meituan Technology Team

Dec 2, 2021

Pretraining Techniques for Search Advertising Relevance at Meituan

Search advertising on Meituan’s platform not only drives revenue but also directly impacts user experience. Irrelevant ads degrade the ecosystem, so improving ad relevance is essential. This article presents a comprehensive solution that leverages pre‑training techniques, data augmentation, multi‑task learning, and model compression to enhance relevance scoring.

Problem and Challenges – Defining relevance for search ads is non‑trivial because ads are displayed in a native style indistinguishable from organic results. Traditional metrics such as CTR are insufficient; a Badcase@5 metric (the proportion of irrelevant ads in the top‑5 positions) is introduced. Challenges include diverse POI (point‑of‑interest) texts containing long product descriptions, sparse query signals, and the need to balance relevance with monetization.

Industry and Meituan Solutions – Both Google and Bing have applied BERT for query‑document matching. Meituan adopts similar ideas and extends them with proprietary methods such as PROP, B‑PROP, DeFormer, RocketQA, and SimCSE.

Algorithm Exploration

Data Augmentation: Positive samples are weighted by click confidence; negative samples are stratified into global random, first‑level category, and third‑level category groups. Sampling smoothing and distribution alignment are applied to ensure balanced training data.

Keyword Extraction: NER and term‑weighting filter out noisy address or branch names from queries; POI texts are reduced to concise keyword sets, cutting average length from >240 characters to <240.

Model Optimization: Multi‑task learning shares a BERT backbone across business verticals while using separate classifiers per vertical. Category information is added as a third segment in BERT’s input, and the NSP pre‑training task is replaced by a click‑prediction objective.

Application Practice

Model Compression: Two‑stage knowledge distillation (general‑stage on large unlabeled data, task‑stage on labeled ad data) produces a lightweight 6‑layer, 384‑dimensional model (MT‑BERT‑Medium). Further distillation yields a Siamese twin model (Siamese‑MT‑BERT‑Medium) for efficient online inference.

Relevance Service Chain Optimization: Textual relevance (string overlap, BM25), category relevance, and model scores are fused via LR/GBDT. Thresholds are set per business line to filter low‑quality ads, and re‑ranking incorporates relevance scores alongside CTR and bid factors.

Deployment: High‑frequency query‑POI pairs are scored offline and cached (>90% coverage). Long‑tail queries are scored online using the distilled twin model served via TF‑Serving with Faster‑Transformer acceleration (FP16, 5.5× speed‑up).

Online Effects – In A/B tests, the optimized pipeline improves CTR by 1.0%, reduces Badcase@5 by 2.2 percentage points, and raises NDCG by 2.0 pp while maintaining revenue. Case studies show that irrelevant ads such as “baby photography” and “anti‑aging clinic” are successfully filtered.

Conclusion and Outlook – Pre‑training, multi‑task learning, and knowledge distillation together deliver a relevance system that balances user experience and monetization. Future work includes automated threshold search, richer feature extraction for POI texts, and joint optimization with auxiliary tasks like entity recognition.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning pretraining knowledge distillation BERT Search advertising relevance

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.