Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement
Meituan's elastic scaling system evolved from Hulk 1.0 on OpenStack to Hulk 2.0 on Kubernetes, adding micro‑services, quota management, hybrid‑cloud pools, and automated scheduling, thereby delivering cost savings, high‑availability handling of holiday peaks, delivery spikes, anti‑scraping needs, and SaaS releases, while future plans target stability, usability, and emerging technologies.
Introduction
Elastic scaling provides business value such as handling spikes, cost saving, and automation. Meituan integrates scattered idle resources into a large resource pool and balances cost and performance through elastic scheduling and inventory control.
System Evolution
Meituan’s elastic scaling started with Hulk 1.0 (Docker‑based) built on OpenStack. In 2018 Hulk 2.0 replaced OpenStack with Kubernetes, introduced a self‑developed OS and container engine, and added a PaaS layer (Elastic Scaling 2.0) that solves version inconsistency, resource shortage, high maintenance cost, and low configuration flexibility.
1.0 Architecture
Components include User Portal (OCTO/TTT), Hulk‑ApiServer, Hulk‑Policy (logic with Zookeeper), Hulk data sources (OCTO, CAT, Falcon), and Scheduler (OpenStack‑based).
2.0 Architecture
Key upgrades: replace OpenStack with Kubernetes; micro‑service the monolith; build service portrait data platform; add observability (Alarm, Scanner). New services: Engine, Metrics‑Server/Data, Resource‑Server.
Challenges & Solutions
Technical challenges: version mismatch, insufficient peak capacity, high maintenance, rigid configuration. Solutions include multi‑tenant quota, inventory control, hybrid cloud resource pool (steady + emergency), and automated scheduling.
Resource Management
Quota per business line, 99.9% scaling success SLA, over‑commitment with water‑level monitoring, emergency pool from public‑cloud VMs, automatic shrink‑age after events.
Promotion Strategy
Gradual rollout from internal pilots, data‑driven service selection, value quantification (burst handling, cost saving, automation), deep business engagement, technical training, and feedback loops.
Business Enablement Scenarios
Holiday scaling – cost reduction ~20%.
Daily peak scaling for delivery – 15% of machines become elastic.
Emergency resource guarantee for anti‑scraping services – >700 public‑cloud containers.
Service‑chain scaling for SaaS releases – single‑point configuration reduces manual effort.
Future Plans
Focus on stability (robustness, QoS), usability (pre‑run simulation, auto‑task recommendation), business solutions (link‑level scaling, zone‑specific scaling), and new tech exploration (Knative, KEDA).
Author
Tu Yang, Head of Elastic Strategy Team, Basic Infrastructure Department, Meituan.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
