Big Data 20 min read

Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System

The article details Meituan Delivery’s A/B testing platform and evaluation system, explaining its closed‑loop design, multi‑strategy traffic allocation with AA grouping, comprehensive metric hierarchy, statistical rigor, data integration, and implementation architecture, and outlines future tools for traffic‑volume recommendation.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System

On May 6, 2019 Meituan launched the "Meituan Delivery" brand with the vision of completing 100 million trustworthy deliveries per day, becoming an essential life infrastructure. The article describes the motivations, design, and practice of the A/B evaluation system built by the Meituan Delivery technical team.

Instant delivery relies on three elements—efficiency, cost, and experience. Data‑driven, fine‑grained strategy iteration is achieved through an A/B testing platform that provides a scientific, authoritative evaluation framework.

1. A/B Platform Overview

The platform follows a closed‑loop learning cycle: hypothesis → success metric definition → A/B experiment → analysis → release → new hypothesis. The lifecycle is divided into three stages (pre‑experiment, during experiment, post‑experiment) and three functional modules (experiment configuration, traffic splitting & logging, online analysis).

2. Why Emphasize Evaluation System Construction

2.1 The delivery scenario involves three parties (users, riders, merchants) and non‑independent requests, making traditional A/B traffic splitting unsuitable. Meituan adopts multi‑strategy traffic allocation, AA grouping for unbiased control/experiment groups, and a comprehensive metric system to ensure statistical indistinguishability.

2.2 Relying on custom metrics can lead to biased decisions. An authoritative evaluation system aligns business understanding and supports objective decision‑making.

3. Building the A/B Evaluation System

3.1 A complete metric hierarchy is required. Governance metrics (P0/P1) are strictly reviewed and produced by a dedicated data team, while exploratory metrics (P2) are flexible and fast‑produced by algorithm teams. Both are unified in the evaluation pool.

Data Integration

Experiment configuration data, business data, and “coloring” data (per‑traffic‑entity logs) are integrated into the data warehouse, enabling both high‑level KPI monitoring (PV, UV, conversion) and deep causal analysis.

Metadata Management & Model Configuration

Metadata management registers and reviews governance metrics, ensuring consistency across teams. Model configuration tools connect physical tables to the metric pool, supporting AA grouping, experiment definition, and metric calculation through input, operation, and application components.

3.2 Scientific Evaluation Methods

The article reviews hypothesis testing, Type I/II errors, T‑test, and P‑value usage. It explains how to control false‑positive rates (typically α=0.05) and how to interpret statistical significance in the context of delivery experiments.

AA grouping is introduced to guarantee that control and experiment groups have no statistically significant differences on the chosen traffic‑characterizing metrics before the experiment starts.

Post‑experiment effect evaluation combines authoritative metrics, flexible reporting, and drill‑down from experiment level to traffic‑entity level, providing both high‑level insights and detailed analysis.

Technical Implementation

A stable, flexible data‑retrieval service bridges the metric pool and upper‑layer applications, supporting both AA grouping and AB effect analysis.

4. Summary and Outlook

A/B testing is the "gold standard" for evaluating new product strategies at internet companies. In Meituan Delivery, it is applied to dispatch, pricing, capacity optimization, ETA prediction, etc. Future work includes tooling to recommend required traffic volume based on metric sensitivity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

metricsA/B testingData Integrationstatistical analysisevaluation system
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.