Meituan-Dianping DSP Advertising Coarse Ranking Mechanisms and Scenario‑Based Targeting
Meituan‑Dianping’s DSP coarse‑ranking filters large ad candidate sets by scoring ads with user‑profile, weather, and keyword scenario models—using frequent‑itemset mining, AdaBoost, and TF/IDF—then aggregates these scores via a linear‑regression model to select high‑relevance ads for fine‑ranking, boosting click‑through and conversion rates.
Preface
In Meituan-Dianping's DSP advertising system, an ad goes through coarse ranking, fine ranking, bidding and anti‑fraud stages before exposure. Because the candidate set from recall is large, fine ranking (CTR estimation) cannot sort the whole set at once for performance reasons. Therefore a coarse‑ranking stage is performed first to filter out a high‑relevance subset for fine ranking.
Coarse‑Ranking Mechanism Overview
The coarse‑ranking framework sorts a subset of recalled ads, truncates the result, and passes the truncated candidate set to the fine‑ranking module.
Coarse ranking aims to find ads with high relevance to the traffic, using effective conversion (click‑through, phone calls, appointments, purchases) as the goal. Features include media, user profile, historical behavior, as well as external factors such as LBS and real‑time weather.
Different traffic types have different feature coverage (rich user profiles vs. strong media signals). Accordingly, different coarse‑ranking strategies are applied.
User‑Profile Based Coarse Ranking
User interests strongly affect ad conversion. User profiles (tags) are derived from on‑site behaviors (browsing, clicking, purchasing, reviewing, favoriting) and enriched with mined tags, custom product tags, and ADX data. The tag hierarchy contains five major categories: interest tags aligned with merchant categories, natural attributes (age, gender, city), social attributes (occupation, marital status, education), psychological/cognitive attributes (consumption level, fashion preference), and custom tags.
Offline, the profile is generated daily: data from Hive is merged, standardized to device‑level tags (IMEI/IDFA), and loaded into Redis for online use.
For user‑directed coarse ranking, frequent‑itemset mining is used instead of a learned model because the rule‑based approach offers strong interpretability and easy manual intervention. Click logs are sampled, conversion actions are up‑sampled (e.g., a phone call counts as two clicks), and Spark’s MLlib is used to mine frequent itemsets linking user tags with merchant categories. Items containing only tags or only categories are filtered out, and at most two user tags are kept per item to ensure coverage.
Scoring formula (illustrated in the original image) assigns a weight to each tag‑category pair; higher scores indicate stronger relevance. After offline scoring, engineers manually filter unreasonable items, store the final rules in MySQL, and load them into the retrieval engine daily.
During online serving, the retrieval side loads the offline results into memory. For each ad request, the user’s profile is fetched from Redis, candidate ads are recalled, and the offline scores are applied to rank and truncate the list. A/B traffic split is used to evaluate conversion lift.
Weather‑Scenario Based Coarse Ranking
Real‑time weather influences user behavior (e.g., swimming pool ads in hot weather). Two types of weather data are used: historical data for offline modeling and current hourly forecasts for online scoring. Data is refreshed hourly to balance freshness and latency.
Offline, an AdaBoost (Gentle AdaBoost) model with decision‑stump weak learners is trained per industry (e.g., food, sports) using conversion as the target. Features include temperature, humidity, precipitation, snow, and weather condition codes (one‑hot encoded). The model is stored in Tair as JSON, keyed by industry.
Online, the retrieval engine reads the appropriate model from Tair, obtains the current hour’s weather for the user’s city, and computes a score for each ad. Three optimization layers are applied: model caching per industry, score caching per city‑hour‑industry (cleared each hour), and a configurable iteration‑budget that limits the number of weak‑learner evaluations per request.
Keyword‑Scenario Based Coarse Ranking
User search queries reflect short‑term intent. A TF/IDF offline model maps queries to merchant categories. Click logs are used to compute TF/IDF scores for (query, category) pairs, and the top‑N categories per query are stored in Tair.
Real‑time streaming (Kafka + Spark Streaming) processes user search events in 5‑minute windows. New query‑category scores are merged with existing user‑level scores in Tair, applying exponential decay (half‑life ≈ 72 h) to older weights. The decay formula is w' = w * e^{-α·Δt}, where α is derived from the half‑life.
During ad retrieval, the user’s decayed query‑category scores are fetched from Tair and combined with the candidate ad’s category to produce a final score.
Aggregated Targeting
After obtaining scores from user‑profile, weather, and keyword scenarios, a linear‑regression (LR) model per industry combines them into a unified score. Features x₀…xₙ are the normalized scenario scores (0‑1). The model vector θ is trained offline on click‑through data and applied online to rank ads.
Cold‑start uses default weights; after sufficient data, the trained θ replaces them.
Conclusion
Scenario‑aware coarse ranking integrates multiple traffic signals (user, weather, media) via dedicated models, then aggregates them with an LR model. A/B tests show significant improvements in click‑through and conversion rates. Future work includes enriching scenario features, adding new contextual signals, and exploring alternative modeling techniques.
References
Friedman J, Hastie T, Tibshirani R. (2000). Additive Logistic Regression: A Statistical View of Boosting . Annals of Statistics, 28, 337‑307.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
