Backend Development 10 min read

How Meituan Built a High‑Performance Distributed Architecture for Instant Logistics

This article explains how Meituan's instant logistics platform evolved from a monolithic system to a highly available, scalable, and AI‑enhanced distributed architecture, detailing the technical challenges, architectural upgrades, fault‑tolerance strategies, and future scalability concerns.

Java High-Performance Architecture

Jul 15, 2022

How Meituan Built a High‑Performance Distributed Architecture for Instant Logistics

Background

Meituan's food delivery business has grown for five years, and its instant logistics has been explored for over three years, accumulating experience in building distributed high‑concurrency systems. The main takeaways are the need for ultra‑low fault and latency tolerance and the integration of AI across pricing, ETA, dispatch, capacity planning, subsidies, accounting, voice interaction, LBS mining, operations, and monitoring to boost scale, experience, and reduce cost.

Massive order and rider scale creates huge matching computation challenges.

Holiday or severe weather spikes traffic to many times normal levels.

Logistics fulfillment links online to offline with near‑zero tolerance for failures.

Real‑time, accurate data demands low latency and high reliability.

Meituan Instant Logistics Architecture

The platform focuses on three core tasks: providing SLA guarantees such as ETA and pricing, matching riders under multi‑objective optimization (cost, efficiency, experience), and supporting rider decision‑making with smart voice, route recommendation, and store arrival reminders.

These services rely on a robust distributed system that ensures high availability and concurrency.

Distributed System Practice

The typical distributed structure leverages Meituan's public components: front‑end traffic is balanced by HLB; services communicate via OCTO for registration, discovery, load balancing, fault tolerance, and gray releases, or via message queues like Kafka and RabbitMQ. Storage uses Zebra for distributed database access, monitoring via CAT, caching with Squirrel+Cellar, and scheduling with Crane.

Key challenges include scaling stateful clusters, handling node hotspots, and ensuring resource balance.

Solutions: convert stateful nodes to stateless, distribute computation across smaller nodes for rapid scaling; use Databus for strong consistency between DB and cache, streaming binlog changes to downstream systems; and enforce high availability through full‑link stress testing, periodic health checks, random fault drills, real‑time alerts, rapid fault localization, and post‑incident rollback, throttling, circuit breaking, and degradation mechanisms.

Single‑IDC Rapid Deployment & Disaster Recovery

After an IDC failure, entry services detect faults and automatically switch traffic; rapid IDC expansion pre‑synchronizes data and deploys services before opening traffic. All data‑sync and traffic‑distribution services must support automatic fault detection and removal, and scaling is performed per‑IDC.

Multi‑Center Attempts

When a partition cannot scale, Meituan groups multiple IDC nodes into a virtual center, deploying services uniformly across the center; capacity is increased by adding new IDC nodes.

Unit‑Based Attempts

Unit‑based design improves partition disaster recovery and scaling. Traffic routing is based on region or city; data synchronization may experience latency across locations. SET disaster recovery ensures rapid failover to other SETs when local or remote SETs encounter issues.

Core Intelligent Logistics Technology and Platform

The machine‑learning platform provides an end‑to‑end solution for model training and algorithm deployment, addressing repeated development and data quality inconsistencies between online and offline sources.

JARVIS, an AIOps platform focused on stability, consolidates noisy alerts, automates fault analysis, and improves response speed for distributed clusters.

Future Challenges

Future challenges include managing service bloat as microservices grow, mitigating network amplification from minor latency, rapid fault localization in complex topologies, and transitioning operations from cluster‑level to unit‑level management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Backend Architecture high concurrency Logistics AI integration

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.