How Meituan Built a Scalable AI‑Powered Recommendation Engine
This article details Meituan's end‑to‑end recommendation system, covering its four‑layer architecture, data sources, candidate‑generation strategies, fusion methods, and both linear and non‑linear re‑ranking models, while highlighting practical optimizations like AB testing and online learning.
This article describes the construction and optimization of Meituan's recommendation system, which consists of five layers—data, trigger, fusion‑filter, and ranking—implemented with HBase, Hive, Storm, Spark, and machine‑learning techniques. Two key improvements are candidate‑set fusion and the introduction of a re‑ranking model.
Framework
The system is divided into a data layer (log cleaning and storage), a trigger layer (generating candidate sets from user history, real‑time behavior, and location), a fusion‑filter layer (merging and filtering candidates based on coverage, precision, and product/operation rules), and a ranking layer (applying machine‑learning models for final ordering). AB testing is supported by decoupling the trigger and ranking layers, allowing independent experiments.
Data Application
Meituan leverages massive user‑behavior data generated by its fast‑growing O2O platform. Different data types (active behavior, negative feedback, user profiles, UGC) serve both candidate‑generation algorithms and re‑ranking model features, with varying intent strengths.
Trigger Strategies
1. Collaborative Filtering – Basic CF is enhanced by removing noisy data (spam, fraud), selecting appropriate training windows with decay, and combining user‑based and item‑based approaches. A log‑likelihood ratio similarity (loglikelihood ratio = 2 × (matrixEntropy – rowEntropy – columnEntropy)) is employed.
2. Location‑Based – Real‑time, work, and home locations drive region‑level hot‑deal and purchase‑hot‑deal extraction, which are weighted when a new user request arrives.
3. Query‑Based – Historical searches without conversion are weighted, and query‑deal weights are computed; the top‑N weighted items are recommended on subsequent requests.
4. Graph‑Based – A bipartite user‑deal graph with SimRank similarity propagates relevance beyond two‑hop relationships.
Sub‑Strategy Fusion
Four fusion methods are used: weighted, hierarchical, modulation, and filtering. Meituan combines modulation and hierarchical fusion, assigning proportions based on historical performance and falling back to lower‑ranked strategies when candidate volume is insufficient.
Candidate Re‑Ranking
Machine‑learning models replace simple heuristic ordering. Both non‑linear tree‑based models (Additive Groves) and linear models (Logistic Regression with online FTRL updates) are employed. Non‑linear models capture feature interactions without extensive preprocessing, while linear models benefit from fast training and online learning.
Linear models use online feature vector storage in HBase, Storm for real‑time log parsing, and FTRL for weight updates before deployment.
Data Processing
Sampling: Negative samples are down‑sampled due to click‑through imbalance.
Negative Examples: Implicit negatives are filtered using skip‑above logic; explicit negatives come from user deletions.
Denoising: Fraudulent behavior is removed before training.
Features
Deal attributes (price, discount, sales, rating, category, CTR).
User attributes (level, demographics, client type).
Cross features (user‑deal interactions such as clicks, favorites, purchases).
Distance features (real‑time and historical geographic distances to POIs).
Non‑linear models consume raw features directly, while linear models require bucketization and normalization to the [0,1] range.
Summary
By grounding the system in rich data and sculpting it with sophisticated algorithms, Meituan achieved two major milestones: candidate‑set fusion (enhancing coverage, diversity, and precision) and the introduction of a re‑ranking model (optimizing the final ordering of deals).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
