Artificial Intelligence 14 min read

Meituan O2O Search Ranking System: Online Architecture and Techniques

This article describes Meituan's online search ranking architecture for O2O services, covering data pipelines, feature loading, ranking service workflow, A/B testing, model choices, cold‑start handling, and position bias mitigation, all tailored for mobile‑centric personalized ranking.

Architect
Architect
Architect
Meituan O2O Search Ranking System: Online Architecture and Techniques

Meituan's vision is to connect consumers and merchants, and search ranking is essential for helping users quickly find suitable merchants and deals, thereby improving user experience and conversion rates.

Unlike traditional web search, 90% of Meituan transactions occur on mobile devices, which raises the demand for highly personalized ranking; for example, a hot‑pot restaurant in Beijing may be a good result for a local user but not for a user in another district. Rich mobile behavior data such as location, category preferences, and price sensitivity are leveraged to guide personalization.

To address the O2O characteristics, Meituan built a search ranking solution that yields a several‑percent improvement over rule‑based ranking. The solution has been abstracted into a generic O2O ranking framework that can be deployed to new products or sub‑domains within 1‑2 days, and it is already used in hot‑word suggestion, hotels, KTV, and other services.

Ranking System

The ranking system supports flexible A/B testing to enable rapid algorithm iteration and accurate effect tracking.

The system consists of three main modules: offline data processing, online service, and online data processing.

Offline Data Processing

Search logs (impressions, clicks, orders, payments) are stored in HDFS/Hive. Daily MapReduce jobs perform offline feature extraction, data cleaning & labeling, model training, metric reporting, and feature monitoring.

Offline feature mining produces features for deals/POIs, users, and queries.

Data cleaning removes spam and bot traffic; cleaned data are labeled for model training.

Effect reports generate offline metrics such as AUC and MAP.

Feature monitoring tracks coverage and distribution to detect anomalies.

Online Data Processing

Real‑time logs are processed with Storm/Spark Streaming to generate real‑time features, reports, and monitoring data, which update the online ranking model.

Online Service (Rank Service)

Upon receiving a search request, Rank Service calls the recall service to obtain candidate POIs/Deals, assigns a ranking strategy/model based on A/B test configuration, and applies the strategy/model to rank the candidates.

L1 coarse ranking (fast): uses few features and simple models/rules.

L2 fine‑grained ranking (slower): re‑ranks the top N results from L1 using loaded features and the assigned model.

L3 business rule intervention: applies additional business rules or manual adjustments on top of L2.

Rank Service logs impressions for further online/offline analysis.

A/B Testing

Traffic is split on the Rank Server side using the user UUID to assign buckets, each bucket corresponding to a ranking strategy. Invalid UUIDs are assigned a random bucket. A whitelist can force specific users into a given strategy.

Example A/B test configuration:

{
  "search": {
    "NumberOfBuckets": 100,
    "DefaultStrategy": "Base",
    "Segments": [
      {
        "BeginBucket": 0,
        "EndBucket": 24,
        "WhiteList": [123],
        "Strategy": "Algo-1"
      },
      {
        "BeginBucket": 25,
        "EndBucket": 49,
        "WhiteList": [],
        "Strategy": "Algo-2"
      }
    ]
  }
}

In addition to A/B testing, Interleaving is used for more sensitive comparison of ranking algorithms with smaller traffic.

Feature Loading

Feature loading is a bottleneck for response time. The FeatureLoader module builds a dependency graph of features and loads them in parallel, reducing latency by about 20 ms on average.

FeatureLoader is implemented with Akka actors that schedule and execute feature extraction concurrently.

Features and Models

Since September 2013, Meituan has applied Learning‑to‑Rank (LTR) techniques, benefiting from accurate labeling of user clicks, orders, and payments.

Features

Features are derived from four dimensions: user, query, deal/POI, and context (time, entry point). Cross‑dimensional features such as user‑deal interaction, distance, and textual/semantic similarity are also used.

Models

Pointwise LTR is used, labeling positive samples with clicks, orders, and payments and assigning increasing weights. Online models include:

Gradient Boosting Decision/Regression Trees (GBDT/GBRT) built with a Spark‑based tool, using a ternary tree structure to handle missing features and logistic‑likelihood loss for binary classification.

Logistic Regression (LR) with features partially constructed by GBDT and trained online using FTRL.

Model evaluation combines offline metrics (AUC, MAP) and online A/B testing.

Cold Start

Cold start occurs for new merchants, deals, or users lacking interaction data. Meituan mitigates this by adding textual relevance, category similarity, distance, and attribute features, and by employing an Explore‑&‑Exploit mechanism to give new items exposure and collect feedback.

Position Bias

Since results are displayed as a list on mobile, position heavily influences user behavior. Meituan accounts for position bias by using the Examination Model to adjust click‑through‑rate statistics.

Conclusion

This article introduced the online components of Meituan's search ranking system, including architecture, algorithms, and key modules. Future articles will cover the offline processing pipeline. Continuous data and model exploration drive ongoing ranking improvements.

References

Learning To Rank. Wikipedia

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189‑1232.

He, X., et al. (2014). Practical lessons from predicting clicks on ads at Facebook. KDD ’14.

McMahan, H. B., et al. (2013). Ad click prediction: a view from the trenches. KDD ’13.

Craswell, N., et al. (2008). An experimental comparison of click position‑bias models. WSDM ’08.

Cold Start. Wikipedia

Chapelle, O., et al. (2012). Large‑scale validation and analysis of interleaved search evaluation. TOIS, 30(1), 6.

Akka: http://akka.io

Radlinski, F., & Craswell, N. (2010). Comparing the sensitivity of information retrieval metrics. SIGIR ’10.

machine learningFeature EngineeringA/B testingsearch rankingonline serving
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.