Meituan Instant Logistics: Evolution of Distributed System Architecture and Practices

The article details Meituan's five‑year journey in instant logistics, describing how distributed, high‑concurrency system architecture evolved through layered upgrades, microservices, fault‑tolerance mechanisms, AI‑driven optimization, and AIOps platforms to achieve scalability, low latency, high availability, and cost efficiency.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Meituan Instant Logistics: Evolution of Distributed System Architecture and Practices

Background

Meituan's food delivery has grown for five years, and its instant logistics has been explored for over three years, accumulating experience in building distributed high‑concurrency systems. The main takeaways are twofold: the instant logistics business tolerates almost no failures or high latency, requiring distributed, scalable, and disaster‑tolerant architecture; and the system heavily integrates AI technologies across pricing, ETA, dispatch, capacity planning, subsidies, accounting, voice interaction, LBS mining, operations, and monitoring to boost scale, preserve experience, and reduce cost.

Massive order and rider scale leads to ultra‑large‑scale matching computations.

Holiday or severe weather spikes cause traffic to surge many times the normal level.

Logistics fulfillment links online to offline, demanding near‑zero tolerance for downtime, loss, and extremely high availability.

Real‑time, accurate data is highly sensitive to latency and anomalies.

Meituan Instant Logistics Architecture

The platform focuses on three core tasks: providing users with SLA guarantees such as ETA and delivery fee pricing; matching the most suitable rider under multi‑objective (cost, efficiency, experience) optimization; and offering riders decision‑support tools like intelligent voice, route recommendation, and store‑arrival reminders.

Behind these services lies Meituan's powerful technology stack, which has evolved from vertical services to layered services and finally to microservices, following the principle that good architecture emerges through evolution rather than premature design.

Distributed System Practices

The typical Meituan distributed system structure relies on public components and services to provide partition scaling, disaster recovery, and monitoring. Front‑end traffic is balanced by HLB; services within a partition communicate via OCTO for registration, discovery, load balancing, fault tolerance, and gray releases, while message queues such as Kafka or RabbitMQ can also be used. Storage accesses distributed databases via Zebra, monitoring is handled by CAT (Meituan's open‑source distributed monitoring system), caching uses Squirrel+Cellar, and task scheduling is performed by Crane.

Key challenges addressed include stateful cluster scalability, node hotspot issues, and uneven resource utilization.

First, the backend team transformed stateful nodes into stateless ones and leveraged parallel computation to let smaller nodes share the load, enabling rapid scaling.

Second, consistency between database and cache writes is ensured via Databus, a high‑availability, low‑latency, high‑concurrency change‑data‑capture system that streams binlog changes to ES, other DBs, or KV stores.

Third, high availability is maintained through pre‑emptive full‑link stress testing, periodic health checks, random fault drills, real‑time alerts, rapid fault localization, systematic rollback, throttling, circuit breaking, degradation, and fallback mechanisms.

Rapid Single‑IDC Deployment & Disaster Recovery

After a single IDC failure, entry services detect the fault and automatically switch traffic; rapid scaling pre‑synchronizes data and deploys services before opening traffic. All data‑sync and traffic‑distribution services must support automatic fault detection and removal, and scaling is performed per IDC.

Multi‑Center Attempts

Meituan groups multiple IDC partitions into virtual centers; services are deployed uniformly across a center, and capacity is expanded by adding new IDC units.

Cellularization Attempts

Compared to multi‑center, cellularization offers finer‑grained partition disaster recovery and scaling. Traffic routing is based on regions or cities, and data synchronization across locations may experience latency. SET disaster recovery ensures rapid failover to other SETs when local or remote SETs encounter issues.

Core Intelligent Logistics Technologies and Platform Accumulation

The Machine Learning Platform provides an end‑to‑end environment for model training and algorithm deployment, addressing the challenges of diverse algorithm scenarios, repeated development, and inconsistent data quality between online and offline.

JARVIS is an AIOps platform aimed at stability, consolidating massive alarm sources, reducing duplicate alerts, and improving fault analysis efficiency compared to manual, experience‑based methods.

Future Challenges

Future challenges include the bloating of microservices as business complexity grows, network amplification effects from minor latency in mesh‑structured clusters, rapid fault localization in complex topologies, and the shift from cluster‑level to unit‑level operations after cellularization, all of which demand advanced AIOps solutions.

Author Bio

Song Bin, senior technical expert at Meituan, has been involved in distributed system architecture and high‑concurrency stability for years. He leads the instant logistics backend team, overseeing scheduling, settlement, LBS, pricing, algorithm data platforms, and stability platforms, with recent focus on AIOps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsmicroservicesHigh ConcurrencyaiopsMeituaninstant logistics
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.