How Meituan Builds and Optimizes Its Recommendation System

This article explains Meituan's end‑to‑end recommendation system architecture, data processing pipeline, candidate generation strategies, model training and online ranking techniques, illustrating how data, algorithms, and real‑time signals are combined to improve relevance and conversion.

21CTO
21CTO
21CTO
How Meituan Builds and Optimizes Its Recommendation System

Introduction

Recommendation systems have existed for a long time, but they only became a prominent module in internet companies in recent years. With the rapid growth of information online, users face severe information overload, making it difficult to find valuable content without assistance.

Two main approaches address overload: search, which relies on explicit user queries, and recommendation, which infers user intent from implicit behavior, especially in e‑commerce scenarios where users browse without a clear purchase goal.

Meituan, a fast‑growing O2O platform, leverages its massive user base and rich behavior data to develop and continuously improve its recommendation system.

Framework

From a framework perspective, the system consists of four layers: data layer, trigger layer, fusion‑filter layer, and ranking layer.

Data layer: collects raw logs, cleanses them, and stores formatted data in various storage systems for downstream algorithms.

Trigger layer: generates candidate sets based on user history, real‑time actions, and geographic location.

Fusion‑filter layer: merges candidates from different triggers to increase coverage and applies business rules to filter unsuitable items.

Ranking layer: re‑orders the candidate set using machine‑learning models.

Both the trigger and ranking layers support frequent A/B testing and are decoupled to allow independent experiments.

Data Application

Data is the foundation of algorithms and models. Meituan collects massive user behavior data, which varies in value and intent strength.

Behavior categories:

Active behaviors: search, filter, click, favorite, order, payment, rating.

UGC: text reviews, uploaded images.

Negative feedback: swipe left, cancel favorite, cancel order, refund, low rating.

User profile: demographics, Meituan DNA, category preferences, spending level, work and residence locations.

Active behavior data serve both offline candidate generation and provide weighted signals for training ranking models. Negative feedback helps filter or down‑weight items and serves as valuable negative examples during model training. User profiles are used for weighting in candidate generation and as features in ranking models. UGC keywords are extracted to tag deals for personalized display.

Trigger Strategies

Beyond data, algorithms turn signals into candidates.

1. Collaborative Filtering

Basic user‑based and item‑based collaborative filtering is enhanced with data cleaning (removing spam, fraud), appropriate training windows, and time decay. Similarity can be computed using log‑likelihood ratio:

logLikelihoodRatio = 2 * (matrixEntropy - rowEntropy - columnEntropy)

where rowEntropy = entropy(k11, k12) + entropy(k21, k22), columnEntropy = entropy(k11, k21) + entropy(k12, k22), and matrixEntropy = entropy(k11, k12, k21, k22).

2. Location‑Based

Geographic location influences user intent. Real‑time location, work, and residence are used to weight regional hot deals and to compute similarity via collaborative filtering.

3. Query‑Based

Even non‑converting searches indicate intent. We weight past queries and associated deals, then recommend top‑N items for a returning user.

Mine past queries without conversion and compute query weights.

Compute deal weights per query.

When the user returns, combine query and deal weights to select top recommendations.

4. Graph‑Based

Using a bipartite user‑deal graph, SimRank measures similarity by propagating relationships across two‑hop connections.

Let s(A,B) denote similarity between users A and B (A ≠ B)
Let s(c,d) denote similarity between items c and d (c ≠ d)

Similarity is computed via matrix iteration, then applied like collaborative filtering for online recommendation.

5. Real‑Time User Behavior

Real‑time actions such as browsing and favoriting are captured to adjust candidate scores, recognizing that upstream behaviors often indicate latent interest even without immediate conversion.

6. Supplementary Strategies

For new or sparse users, fallback candidates include hot‑selling items, highly rated items, and city‑specific deals.

Sub‑Strategy Fusion

To improve diversity and coverage, multiple trigger algorithms are combined using weighted, hierarchical, or filtering approaches. Meituan employs a hybrid of modulation (proportional candidate generation) and tiered (prioritizing higher‑performing algorithms) fusion.

Candidate Re‑Ranking

Initial candidate lists are further refined with machine‑learning ranking models that consider numerous features.

1. Models

Both non‑linear and linear models are used.

Non‑linear: Additive Groves (AG), an ensemble of decision trees combined via bagging, reduces over‑fitting and captures complex feature interactions.

Each grove consists of multiple trees; residuals are fitted iteratively until convergence.

Linear: Logistic Regression with online learning via Google's FTRL algorithm, updating model weights in real time from streaming click and order logs.

2. Data

Sampling addresses class imbalance; negative samples are carefully defined to avoid treating unseen impressions as negatives. Noise such as fraud or spam is filtered out.

3. Features

Features fall into four groups:

Deal‑level attributes: price, discount, sales, rating, category, CTR.

User‑level attributes: level, demographics, client type.

Cross features: user‑deal interactions like clicks, favorites, purchases.

Distance features: real‑time and frequent geographic distances between user and POI.

Non‑linear models use raw features directly, while linear models require bucketization and normalization.

Conclusion

By grounding the system in rich data and sculpting it with sophisticated algorithms, Meituan achieved two key milestones: merging candidate sets to boost coverage, diversity, and precision, and introducing a re‑ranking model to order deals effectively after candidate expansion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringmachine learningAIrecommendation systemrankingMeituan
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.