Artificial Intelligence 19 min read

Inside Meituan’s Recommendation Engine: From Data to Real‑Time Ranking

This article outlines Meituan’s end‑to‑end recommendation system, describing its data layer, candidate‑generation triggers, fusion strategies, and machine‑learning‑based ranking models—including collaborative filtering, location‑based, query‑based, graph‑based methods, and both linear and non‑linear models—while highlighting practical optimizations such as AB testing, real‑time behavior handling, and fallback strategies.

21CTO

Sep 8, 2015

Inside Meituan’s Recommendation Engine: From Data to Real‑Time Ranking

Preface

Recommendation systems have long existed, but they have become a core module in many Internet companies only in recent years. With the rapid growth of information online, users face severe information overload. Search works when user intent is explicit, while recommendation is needed when intent is vague or implicit. Meituan, a fast‑growing O2O platform, leverages its massive user base and rich behavior data to build and continuously improve its recommendation system.

Architecture

The system can be divided into four layers: data layer, trigger layer, fusion‑filter layer, and ranking layer. The data layer cleans raw logs and stores formatted data in various storage systems. The trigger layer generates candidate sets from user history, real‑time actions, and geographic location. The fusion‑filter layer merges candidates from different triggers, applies coverage‑enhancing and rule‑based filtering, and the ranking layer re‑orders the candidates using machine‑learning models. Both the trigger and ranking layers are frequently updated and therefore support A/B testing with orthogonal results.

Data Utilization

Data is the foundation of algorithms and models. Meituan’s user‑generated data includes:

Active behavior: search, filter, click, favorite, order, payment, rating.

UGC: textual reviews, uploaded images.

Negative feedback: swipe‑left delete, cancel favorite, cancel order, refund, low rating.

User profile: demographics, Meituan DNA, category preferences, spending level, work and residence locations.

These signals are used for candidate generation, weighting in the ranking model, and as cross‑features for offline training and online prediction.

Trigger Strategies

Beyond data, algorithms turn data into candidates.

1. Collaborative Filtering

Standard CF is employed, with enhancements such as noise removal (e.g., fraud, bots), appropriate training windows, and a hybrid of user‑based and item‑based approaches. Similarity is computed using a log‑likelihood ratio:

logLikelihoodRatio = 2 * (matrixEntropy - rowEntropy - columnEntropy)

where rowEntropy, columnEntropy, and matrixEntropy are Shannon entropies of the contingency table.

2. Location‑Based

Mobile devices provide dynamic location signals. Regional hot‑deal lists are built for each geographic cluster (e.g., business districts) and weighted according to the user’s current, work, or home location.

3. Query‑Based

Historical queries without conversion are weighted, and each query‑deal pair receives a score. At request time the top‑N weighted deals are returned.

4. Graph‑Based (SimRank)

Items and users are modeled as a bipartite graph; similarity propagates through the graph using SimRank. The similarity between two entities is derived from the similarity of their neighbors.

5. Real‑Time User Behavior

Real‑time browsing and favoriting are captured and used to adjust recommendations when a user returns, ensuring that prior intent influences the next exposure.

6. Fallback Strategies

Hot‑selling items (with time decay).

Highly rated items.

City‑specific items that satisfy basic constraints.

Sub‑Strategy Fusion

To improve diversity and coverage, multiple trigger algorithms are combined using weighted, hierarchical, and modulation‑type fusion. Meituan’s production system blends modulation and hierarchical fusion: algorithms receive a proportion based on historical performance, and lower‑ranked algorithms are invoked only when the candidate set is insufficient.

Candidate Re‑Ranking

Simple heuristic ordering is insufficient; a machine‑learning ranking model aggregates many signals.

1. Models

Both non‑linear (Additive Groves tree ensembles) and linear (Logistic Regression with online FTRL updates) models are used. Non‑linear models capture complex feature interactions without extensive manual feature engineering, while linear models benefit from fast training/prediction and can be updated online.

Online pipeline: write feature vectors to HBase → Storm parses real‑time click/order logs and updates labels → FTRL updates model weights → new model is deployed.

2. Data

Sampling to address extreme class imbalance between clicks/orders (positive) and non‑clicks (negative).

Negative samples are carefully defined (e.g., skip‑above rule, explicit user deletions) to avoid bias.

Noise removal to filter fraudulent behavior.

3. Features

Deal‑level features: price, discount, sales, rating, category, CTR.

User‑level features: tier, demographics, client type.

User‑deal cross features: historical clicks, favorites, purchases.

Distance features: real‑time and historical geographic distances to POIs.

Non‑linear models consume raw features directly; linear models require bucketization and normalization.

Conclusion

Data‑driven algorithms, when tightly coupled, significantly improve recommendation performance. Two key milestones for Meituan were merging candidate sets to boost coverage, diversity, and precision, and introducing a learning‑to‑rank model to order the enlarged candidate pool effectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system collaborative filtering online learning Meituan ranking model candidate generation

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.