How Meituan Scaled Its Mobile Deal System for Mega‑Promotions: Traffic Modeling & Capacity Planning
This article details Meituan's technical approach to handling massive traffic spikes during large‑scale promotions, covering background of the O2O deal platform, traffic‑model construction, capacity‑budget calculations, micro‑service architecture evolution, pressure‑test strategies, and the PTP performance‑testing environment used to validate system limits.
Introduction
Meituan‑Dianping hosts a monthly technical salon where engineers share practical experiences. The eleventh session, held in Shanghai, focused on mobile testing and capacity planning for large‑scale promotional events that generate billions of daily transactions and millions of orders.
Background
During major promotions, instantaneous traffic can surge to dozens of times the normal load, creating challenges for the deal system. The core path of a promotion includes three key concepts: instantaneous traffic, hotspot deals, and the core user journey.
Instantaneous traffic : A short‑lived surge when a batch of zero‑price deals is released, causing a traffic peak up to 33× normal levels.
Hotspot deals : The zero‑price deals released at a specific time (e.g., 10 am).
Core path : Users click the “free grab” button, view the deal detail page, and proceed to order submission and payment.
Capacity Expansion Planning
To determine whether the system can handle the peak, Meituan first calculates the required expansion using the formula: Required capacity = Peak traffic / Single‑machine capacity Peak traffic is derived from operational data such as PV/UV forecasts, while single‑machine capacity is obtained from pressure‑test results for each service node.
Deal System Architecture Evolution
The early architecture was a monolithic Web and Service layer with over 200 tightly coupled APIs. This made traffic budgeting simple but capacity estimation difficult due to high testing costs and lack of isolation.
Later, the system was refactored into micro‑services: separate services for pricing, inventory, attribute navigation, and split Web applications for detail, transaction, and personal‑center functions. This modularization enabled accurate traffic modeling and capacity analysis.
Traffic Model Construction
The traffic model focuses on the core path during a promotion. Users navigate from an H5 activity page to a native detail page, then to order submission and payment. The model quantifies API call frequencies at each layer (Web, service, cache, DB) based on observed call stacks and CAT monitoring data.
Key findings include:
The detail page generates three primary API calls (basic info, purchase instructions, merchant info) that dominate user decisions.
Non‑core modules (other deals, reviews, recommendations) can be disabled during peak load.
For each detail‑page PV, the underlying price and inventory services are invoked with a 1:7 ratio, leading to a detailed flow‑matrix used for capacity estimation.
System Capacity Evaluation
Capacity is assessed via pressure testing in three environments:
BETA : QA environment with full service deployment but hardware differs from production; only single‑machine capacity can be measured.
PPE : Pre‑production environment with partial data sync; similar limitations as BETA.
Online : Production‑like environment allowing cluster‑level testing but carries high risk.
Meituan introduced the PTP (Performance Test Platform) environment, a Docker‑based virtual machine pool that can spin up the target service and its dependencies on demand, providing consistent data, configurable CPU/memory, and automated cleanup.
Pressure‑Test Strategy
The goal of pre‑promotion pressure testing is to discover the maximum processing capability under mixed read/write workloads, not just functional correctness. Strategies include:
Online incremental scaling by gradually reducing cluster size.
Online TCPCopy to mirror live traffic onto a test node.
Offline testing in PTP with scripted JMeter scenarios.
Test scenarios are designed to reflect real‑world request distributions, such as:
Proportion of single vs. batch queries for the basic‑info service.
Batch size of inventory‑query requests based on log statistics.
Category‑specific logic that triggers different cache/database paths.
Test Data Construction
Three data sources are used:
Traffic replay : Capture live TCP traffic and replay it in the test environment.
Log replay : Use Nginx access logs, separating read and write operations.
Artificial data : Generate CSV files that mimic cache hit rates and data distribution.
Result Collection and Expansion Formula
After executing the tests, Meituan aggregates per‑node peak traffic and single‑machine capacity to compute the expansion factor (peak traffic × safety margin). This yields a concrete expansion plan that guides both horizontal scaling and resource provisioning for the upcoming promotion.
Conclusion
Before a major deal promotion, Meituan performs two critical steps: (1) a traffic‑model‑based budget to estimate the numerator of the expansion formula, and (2) a pressure‑test‑driven capacity assessment to determine the denominator. Together they ensure that core user experience remains smooth during massive traffic spikes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
