Operations 20 min read

Design and Implementation of an A/B Evaluation System for Meituan Delivery

This article describes how Meituan's delivery team built a comprehensive A/B testing evaluation platform, covering the motivation for a robust assessment framework, the architecture of the platform with three functional modules, the statistical methods for reliable experiment design, and the practical implementation details that enable data‑driven operational decisions.

DevOps

Jul 8, 2020

Design and Implementation of an A/B Evaluation System for Meituan Delivery

On May 6, 2019, Meituan launched the new brand "Meituan Delivery" with a vision to complete one hundred million trustworthy deliveries daily, becoming an essential infrastructure for daily life. Today, the service supports over 4 million merchants, 400 million users, and more than 700,000 active couriers across 2,800+ cities.

The article begins by explaining why an evaluation system is needed and then details the thoughts and practices of Meituan Delivery's technical team in building an A/B evaluation framework, including how to establish a complete metric system and a scientific assessment method.

Instant delivery hinges on three elements—efficiency, cost, and experience—improved through fine‑grained strategy iteration. Decisions are no longer made arbitrarily; they rely on data‑driven feedback that indicates performance and potential growth.

A/B experiments serve as a powerful tool for such iteration. By defining multiple versions of a strategy, assigning them to comparable groups, and collecting experience and business data, the best version can be identified and adopted.

1. A/B Platform Overview

The platform consists of three modules that correspond to the three stages of the A/B lifecycle: experiment configuration management, traffic splitting and logging, and online analysis.

The workflow is illustrated as a closed loop: hypothesis → define success metrics → conduct A/B experiment → analyze and learn → release → formulate new hypothesis.

2. Why Emphasize Evaluation System Construction

Traditional A/B platforms use simple hash‑based traffic splitting, assuming independent and identically distributed traffic. In delivery scenarios, traffic involves users, couriers, and merchants, making requests interdependent and heavily influenced by offline factors. Therefore, Meituan adopts multiple splitting strategies, including layered models and AA grouping, to ensure statistically indistinguishable control and treatment groups.

Two main problems arise when relying on experimenters to define custom metrics: (1) lack of objectivity and potential bias toward supporting their hypothesis, and (2) misalignment with business goals, making results hard to adopt.

3. Building the A/B Evaluation System

The system addresses two core issues: a comprehensive, authoritative metric hierarchy (P0/P1 governance metrics and P2 exploratory metrics) and a scientific evaluation method based on hypothesis testing.

3.1 Authoritative Metric System

Governance metrics must be registered, reviewed, and produced by an independent data team to ensure authority and consistency. Exploratory metrics (P2) prioritize flexibility and rapid implementation.

Data integration combines experiment configuration, business data, and coloring data to enable both high‑level traffic metrics (PV, UV, conversion) and deep exploration of strategy impact.

3.2 Scientific Evaluation Method

Statistical hypothesis testing (including Z‑test, T‑test, and chi‑square) is used to verify experiment hypotheses. The process controls Type I error (false positive) as the primary concern, employing P‑values to decide whether to reject the null hypothesis.

AA grouping ensures that pre‑experiment traffic is split into control and treatment groups with no statistically significant differences, using dynamic programming to minimize metric variance.

Post‑experiment evaluation generates authoritative reports that are flexible (column‑to‑row transformation), convenient (drill‑down from experiment to entity level), and based on both governance and exploratory metrics.

4. Technical Implementation

The core of the architecture is a stable, flexible data retrieval service that bridges upstream applications and the metric system. Offline modeling and metadata management build the authoritative metric pool, while the retrieval service supplies metrics to various application services.

In summary, A/B testing has become the "gold standard" for evaluating new product strategies in many internet companies. In Meituan Delivery, it is widely used for dispatch, pricing, capacity optimization, and ETA prediction. Future work includes building auxiliary tools to recommend traffic scale based on metric sensitivity, ensuring statistically meaningful experiments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics A/B testing Data-Driven Meituan evaluation system

Written by

DevOps

Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.