Evolution and Scaling of Meituan's Food Delivery Order System
Meituan’s food‑delivery order platform evolved from a simple monolithic prototype in 2013 to a distributed, highly available service that now handles tens of millions of orders daily, using asynchronous processing, sharding, multi‑data‑center deployment, and automated operations to achieve sub‑150 ms transaction latency and 99.9999% uptime.
Since the first order in September 2013, Meituan Waimai has grown from a few daily orders to a platform handling 5 million orders per day, expanding from a single‑item service to a full‑category O2O food‑delivery platform.
The rapid increase in order volume and business complexity forced the order system to evolve from a monolithic module to a distributed, high‑performance, highly available architecture. This article outlines the major stages of that evolution, focusing on business characteristics, challenges, and technical solutions.
Food Delivery Order Business
The service requires real‑time processing: an order must be placed, paid, accepted by the merchant, delivered, and completed within about one hour. Order volume peaks during lunch and dinner, creating high load during those “meal times”.
Early Prototype
In the early stage, the goal was rapid validation. Order‑related functions were packaged into a shared JAR and used by various modules. The architecture was simple and flexible, suitable for low traffic and fast iteration.
As order volume grew, the monolithic approach caused coordination overhead and cross‑service impact.
Independent Order System (2014)
When daily orders reached 100,000, the system was split into an independent order service accessed via RPC. The split followed three principles: business isolation, priority‑aligned services, and inclusion of both business logic and data.
The new architecture introduced three sub‑systems (transaction, query, asynchronous processing) and a dedicated data storage layer.
Isolation reduced inter‑service interference and supported faster iteration while maintaining stability.
Performance Optimizations
To handle millions of daily orders, several techniques were applied:
Asynchronous processing (thread pools or message queues) to off‑load non‑critical work such as push notifications and statistics.
Parallelization of independent calls (e.g., fetching store, menu, and user info concurrently) to shorten order latency.
Caching of pre‑computed statistics (e.g., first‑order discounts) to avoid costly real‑time calculations.
Examples include asynchronous PUSH delivery (see diagram) and parallel data fetching (see diagram).
Consistency Enhancements
Order transactions require eventual consistency. Techniques used include:
Retry with idempotence for operations such as refunds, ensuring eventual success while preventing duplicate actions.
Two‑phase commit (2PC) for multi‑resource operations like inventory reservation.
High Availability
Availability is achieved at three layers:
Storage layer – MySQL master‑slave clusters, Elasticsearch sharding.
Middleware layer – high‑available open‑source components.
Service layer – stateless services behind load balancers, Hystrix for circuit‑breaking, thread‑pool isolation, timeout settings, and fallback logic.
Multi‑data‑center deployment and stateless design further eliminate single‑point failures.
Scalability Improvements
To overcome vertical scaling limits, the system adopted sharding and partitioning:
Database sharding (horizontal partitioning) across multiple MySQL instances.
Table partitioning by order ID, user ID, and store ID, storing three copies of each order to balance write load and keep query latency low.
Query routing is handled either by a lightweight middleware plugin (Spring + MyBatis) that dynamically switches data sources and table names based on annotations.
For queries not covered by the three primary dimensions, Elasticsearch is used to provide full‑text search capabilities.
Operational Automation (Intelligent Ops)
Initially, operations relied on manual troubleshooting. As complexity grew, automated and intelligent measures were introduced:
Pre‑incident: regular online stress tests, periodic health checks, and full‑link logging for proactive issue detection.
During incident: real‑time monitoring dashboards for order and system metrics, and standardized SOPs for rapid response.
Post‑incident: root‑cause analysis, knowledge sharing, and continuous improvement of pre‑ and during‑incident measures.
These practices dramatically increased efficiency and reduced manpower costs.
Overall, the order system now processes tens of millions of orders per day with a 99.9999% availability target, achieving sub‑150 ms transaction latency (tp99) and sub‑40 ms query latency (tp99).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
