Meituan Instant Logistics: Distributed System Architecture, Practices, and Future Challenges

The article details Meituan’s five‑year evolution of its instant logistics platform, describing the distributed backend architecture, AI‑driven optimization, scalability and high‑availability practices, as well as future challenges in microservice complexity and operational automation.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Meituan Instant Logistics: Distributed System Architecture, Practices, and Future Challenges

Meituan's instant logistics has been operating for five years, with more than three years of real‑time logistics exploration, accumulating experience in building distributed high‑concurrency systems. The main takeaways are the extremely low tolerance for failures and latency, requiring distributed, scalable, fault‑tolerant architecture, and the extensive use of AI to improve cost, efficiency, and user experience.

The system faces several technical challenges: massive order and rider scale leading to huge matching computations, traffic spikes during holidays or adverse weather that can be dozens of times normal, a near‑zero tolerance for downtime or lost orders, and strict real‑time data accuracy requirements.

The platform provides three core functions: SLA delivery (ETA calculation, pricing), optimal rider matching under cost‑efficiency‑experience constraints, and rider decision‑support (voice interaction, route recommendation, store‑arrival reminders). It relies on Meituan’s common components: front‑end traffic is load‑balanced by HLB; services communicate via OCTO (service registration, discovery, load balancing, fault tolerance, gray release); messaging uses Kafka or RabbitMQ; storage accesses distributed databases through Zebra; monitoring uses the open‑source CAT system; caching combines Squirrel and Cellar; and task scheduling is handled by Crane.

Practical issues include cluster scalability, especially for stateful services, and hotspot resources. Solutions involve converting stateful nodes to stateless, leveraging parallel computation, and using Databus to ensure strong consistency between databases and caches. High availability is achieved through full‑link stress testing, periodic health checks, chaos engineering, real‑time alarm monitoring, rapid fault localization, and systematic rollback and mitigation mechanisms.

For single‑IDC disaster recovery, the entry service detects failures and automatically switches traffic; rapid IDC scaling is supported by pre‑synchronised data and services, with traffic opened only after readiness. All data‑sync and traffic‑distribution services must support automatic fault detection and removal, enabling scaling by IDC.

Multi‑center attempts group several IDC’s into a virtual centre to overcome partition capacity limits, deploying services uniformly across centres and adding new IDC’s when capacity is insufficient.

Unit‑based attempts provide finer‑grained partition disaster recovery and scaling, routing traffic by region or city, handling cross‑region data latency, and ensuring SET failover when local or remote SETs encounter issues.

Core intelligent‑logistics capabilities include a machine‑learning platform for end‑to‑end model training and deployment, and the JARVIS AIOps platform that stabilises operations by reducing alarm noise, accelerating fault diagnosis, and improving reliability.

Future challenges identified are microservice bloat as business complexity grows, network amplification caused by latency in mesh‑structured service clusters, difficulty in rapid fault localisation within complex topologies, and the shift from cluster‑level to unit‑level operations, which raises deployment and maintenance complexity.

Author: Song Bin, senior technical expert at Meituan, responsible for the backend of the instant logistics team, with extensive experience in distributed systems, high‑concurrency stability, and current focus on AIOps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsMicroservicesScalabilityhigh availabilityLogisticsaiops
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.