Inside Meituan’s Recommendation Engine: From Data to Real‑Time Ranking
This article outlines Meituan’s end‑to‑end recommendation system, describing its data layer, candidate‑generation triggers, fusion strategies, and machine‑learning‑based ranking models—including collaborative filtering, location‑based, query‑based, graph‑based methods, and both linear and non‑linear models—while highlighting practical optimizations such as AB testing, real‑time behavior handling, and fallback strategies.
Preface
Recommendation systems have long existed, but they have become a core module in many Internet companies only in recent years. With the rapid growth of information online, users face severe information overload. Search works when user intent is explicit, while recommendation is needed when intent is vague or implicit. Meituan, a fast‑growing O2O platform, leverages its massive user base and rich behavior data to build and continuously improve its recommendation system.
Architecture
The system can be divided into four layers: data layer, trigger layer, fusion‑filter layer, and ranking layer. The data layer cleans raw logs and stores formatted data in various storage systems. The trigger layer generates candidate sets from user history, real‑time actions, and geographic location. The fusion‑filter layer merges candidates from different triggers, applies coverage‑enhancing and rule‑based filtering, and the ranking layer re‑orders the candidates using machine‑learning models. Both the trigger and ranking layers are frequently updated and therefore support A/B testing with orthogonal results.
Data Utilization
Data is the foundation of algorithms and models. Meituan’s user‑generated data includes:
Active behavior: search, filter, click, favorite, order, payment, rating.
UGC: textual reviews, uploaded images.
Negative feedback: swipe‑left delete, cancel favorite, cancel order, refund, low rating.
User profile: demographics, Meituan DNA, category preferences, spending level, work and residence locations.
These signals are used for candidate generation, weighting in the ranking model, and as cross‑features for offline training and online prediction.
Trigger Strategies
Beyond data, algorithms turn data into candidates.
1. Collaborative Filtering
Standard CF is employed, with enhancements such as noise removal (e.g., fraud, bots), appropriate training windows, and a hybrid of user‑based and item‑based approaches. Similarity is computed using a log‑likelihood ratio:
logLikelihoodRatio = 2 * (matrixEntropy - rowEntropy - columnEntropy)
where rowEntropy, columnEntropy, and matrixEntropy are Shannon entropies of the contingency table.
2. Location‑Based
Mobile devices provide dynamic location signals. Regional hot‑deal lists are built for each geographic cluster (e.g., business districts) and weighted according to the user’s current, work, or home location.
3. Query‑Based
Historical queries without conversion are weighted, and each query‑deal pair receives a score. At request time the top‑N weighted deals are returned.
4. Graph‑Based (SimRank)
Items and users are modeled as a bipartite graph; similarity propagates through the graph using SimRank. The similarity between two entities is derived from the similarity of their neighbors.
5. Real‑Time User Behavior
Real‑time browsing and favoriting are captured and used to adjust recommendations when a user returns, ensuring that prior intent influences the next exposure.
6. Fallback Strategies
Hot‑selling items (with time decay).
Highly rated items.
City‑specific items that satisfy basic constraints.
Sub‑Strategy Fusion
To improve diversity and coverage, multiple trigger algorithms are combined using weighted, hierarchical, and modulation‑type fusion. Meituan’s production system blends modulation and hierarchical fusion: algorithms receive a proportion based on historical performance, and lower‑ranked algorithms are invoked only when the candidate set is insufficient.
Candidate Re‑Ranking
Simple heuristic ordering is insufficient; a machine‑learning ranking model aggregates many signals.
1. Models
Both non‑linear (Additive Groves tree ensembles) and linear (Logistic Regression with online FTRL updates) models are used. Non‑linear models capture complex feature interactions without extensive manual feature engineering, while linear models benefit from fast training/prediction and can be updated online.
Online pipeline: write feature vectors to HBase → Storm parses real‑time click/order logs and updates labels → FTRL updates model weights → new model is deployed.
2. Data
Sampling to address extreme class imbalance between clicks/orders (positive) and non‑clicks (negative).
Negative samples are carefully defined (e.g., skip‑above rule, explicit user deletions) to avoid bias.
Noise removal to filter fraudulent behavior.
3. Features
Deal‑level features: price, discount, sales, rating, category, CTR.
User‑level features: tier, demographics, client type.
User‑deal cross features: historical clicks, favorites, purchases.
Distance features: real‑time and historical geographic distances to POIs.
Non‑linear models consume raw features directly; linear models require bucketization and normalization.
Conclusion
Data‑driven algorithms, when tightly coupled, significantly improve recommendation performance. Two key milestones for Meituan were merging candidate sets to boost coverage, diversity, and precision, and introducing a learning‑to‑rank model to order the enlarged candidate pool effectively.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
