Cloud Native 21 min read

Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement

Meituan's elastic scaling system evolved from Hulk 1.0 on OpenStack to Hulk 2.0 on Kubernetes, adding micro‑services, quota management, hybrid‑cloud pools, and automated scheduling, thereby delivering cost savings, high‑availability handling of holiday peaks, delivery spikes, anti‑scraping needs, and SaaS releases, while future plans target stability, usability, and emerging technologies.

Meituan Technology Team

Apr 1, 2021

Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement

Introduction

Elastic scaling provides business value such as handling spikes, cost saving, and automation. Meituan integrates scattered idle resources into a large resource pool and balances cost and performance through elastic scheduling and inventory control.

System Evolution

Meituan’s elastic scaling started with Hulk 1.0 (Docker‑based) built on OpenStack. In 2018 Hulk 2.0 replaced OpenStack with Kubernetes, introduced a self‑developed OS and container engine, and added a PaaS layer (Elastic Scaling 2.0) that solves version inconsistency, resource shortage, high maintenance cost, and low configuration flexibility.

1.0 Architecture

Components include User Portal (OCTO/TTT), Hulk‑ApiServer, Hulk‑Policy (logic with Zookeeper), Hulk data sources (OCTO, CAT, Falcon), and Scheduler (OpenStack‑based).

2.0 Architecture

Key upgrades: replace OpenStack with Kubernetes; micro‑service the monolith; build service portrait data platform; add observability (Alarm, Scanner). New services: Engine, Metrics‑Server/Data, Resource‑Server.

Challenges & Solutions

Technical challenges: version mismatch, insufficient peak capacity, high maintenance, rigid configuration. Solutions include multi‑tenant quota, inventory control, hybrid cloud resource pool (steady + emergency), and automated scheduling.

Resource Management

Quota per business line, 99.9% scaling success SLA, over‑commitment with water‑level monitoring, emergency pool from public‑cloud VMs, automatic shrink‑age after events.

Promotion Strategy

Gradual rollout from internal pilots, data‑driven service selection, value quantification (burst handling, cost saving, automation), deep business engagement, technical training, and feedback loops.

Business Enablement Scenarios

Holiday scaling – cost reduction ~20%.

Daily peak scaling for delivery – 15% of machines become elastic.

Emergency resource guarantee for anti‑scraping services – >700 public‑cloud containers.

Service‑chain scaling for SaaS releases – single‑point configuration reduces manual effort.

Future Plans

Focus on stability (robustness, QoS), usability (pre‑run simulation, auto‑task recommendation), business solutions (link‑level scaling, zone‑specific scaling), and new tech exploration (Knative, KEDA).

Author

Tu Yang, Head of Elastic Strategy Team, Basic Infrastructure Department, Meituan.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Resource Management PaaS Elastic Scaling Meituan

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.