How Meituan Scaled Service Governance with OCTO Mesh: Architecture & Lessons
Meituan’s OCTO Mesh transforms its massive service governance by adopting a Service Mesh architecture with sidecar proxies, a custom control plane, and meta‑server driven routing, addressing multi‑language support, middleware coupling, heterogenous integration, and scalability challenges while detailing design choices, health‑check strategies, and operational tooling.
Background and Motivation
Meituan’s OCTO is a standardized service governance platform that now covers about 90% of the company’s applications and handles over a trillion daily calls. While the system is mature, it faces several challenges: limited multi‑language support, tight coupling between middleware and business code, high integration cost for heterogeneous technologies, and decentralized governance decisions.
Why Service Mesh?
Adopting a Service Mesh allows each service instance to run a sidecar proxy that handles all inbound and outbound traffic. Core governance functions such as routing and rate limiting move to the sidecar and a centralized control plane, providing a language‑agnostic solution, decoupling middleware upgrades from business deployments, and simplifying integration of heterogeneous subsystems.
Technical Selection and Architecture Design
The design emphasizes four criteria: preserving existing standards, maintaining governance capabilities, supporting ultra‑large scale, and staying close to the open‑source community.
Data plane built on a heavily customized Envoy filter.
Control plane developed in‑house (named Adcore), consisting of Pilot, Dispatcher, health‑check, node management, monitoring, and a Meta Server for service registration and discovery.
Sidecar ( OCTO Proxy) deployed 1:1 with business processes, communicating via UNIX domain sockets and TCP across nodes.
LEGO Agent manages proxy lifecycle and hot upgrades, reducing manual intervention.
Control Plane Details
Adcore Pilot acts as the brain, handling most governance logic and interacting directly with sidecars. Dispatcher serves as an access hub for auxiliary subsystems. Health checks are centralized rather than full mesh P2P, reducing load from N² checks. The Meta Server implements consistent hashing to shard routing data per pilot, enabling efficient failover and load balancing.
Data Flow
Sidecars and pilots communicate via bidirectional streaming using an enhanced xDS protocol. Custom protocols deliver governance commands beyond routing, such as authentication and circuit breaking.
Key Design Analyses
Large‑Scale Mesh Capabilities
Control plane nodes are horizontally scalable; each pilot only holds data for its managed sidecars.
On network partitions, the system can absorb traffic spikes.
Hybrid health‑check combines centralized monitoring with selective P2P checks.
Heterogeneous System Integration
A unified access center (Dispatcher) abstracts away diverse storage and pub/sub mechanisms of existing subsystems. Changes are pushed as lightweight notifications; pilots fetch full data on demand, keeping message queues small and avoiding version conflicts.
Stability Guarantees
To mitigate the inherent complexity of a new mesh, Meituan built extensive fault isolation, automatic rollback, flexible availability controls, observability, and regression testing. A Mock‑Sidecar framework simulates sidecar behavior for control‑plane testing, allowing step‑wise YAML‑defined scenarios and parallel stress tests.
Operations System
The LEGO platform orchestrates proxy upgrades: operators specify target versions and scopes, resources are stored in a repository, and LEGO agents pull and launch new proxies, with built‑in polling to ensure successful rollouts.
Conclusion and Outlook
Key takeaways include the importance of standardization, performance, and ease of use; the necessity of aligning mesh adoption with existing containerization and governance stacks; and the value of a robust stability and operations framework. Future work will expand OCTO Mesh capabilities, broaden traffic types, and explore centralized governance for global optimal decisions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
