How Meituan Cloud Scaled with Containers: Lessons from a Large‑Scale Migration
This article details Meituan Cloud's journey from virtual machines to Docker containers, explaining why the shift was needed, the technical challenges faced, the implementation steps, and the measurable performance and cost benefits achieved through containerization.
Background
Meituan Cloud (MOS) operates both a private cloud that runs all Meituan business services and a public‑cloud platform that offers IaaS/PaaS to external developers. By 2013 the private cloud was fully virtualized; the public cloud launched in 2013.
Motivation for Containers
Business traffic shows strong diurnal and event‑driven spikes (lunch, holidays, live streaming). To meet peak demand the team over‑provisioned VMs 2‑3×, causing low utilization and long provisioning times (minutes per VM, minutes to create MySQL RDS instances). Containers provide sub‑second start‑up, fine‑grained resource control and elastic scaling, directly addressing these bottlenecks.
Container Adoption Strategy
Goal: build a unified management platform for VMs and containers, enable one‑click container deployment, achieve second‑level scaling while preserving production stability.
Docker version and custom patches
Adopted Docker 1.11 as a fixed baseline. Patched daemon crashes, added MosBridge to integrate Docker networking with Meituan’s virtual network, configured a private registry mirror backed by object storage, and extended cgroup‑based resource limits to be mutable at runtime.
Image layering and deployment workflow
Docker images are layered. Example: a Java‑Tomcat web application consists of four layers – base Linux, Java runtime, Tomcat, and the application WAR. Updating the app modifies only the top layer, reducing pull size and deployment time.
Ecosystem components
PLUS : automatically builds Docker images from code commits; developers push code, PLUS triggers a build and stores the image in the private registry.
HULK : provides elastic scaling; it monitors workload and creates or destroys containers based on cgroup‑defined limits.
Public‑cloud PaaS products (e.g., MySQL RDS) were re‑engineered to run as containers, eliminating VM‑level provisioning.
Resource control
Containers use Linux cgroups for CPU, memory and I/O limits. Unlike VMs, limits can be changed on‑the‑fly without restarting the container (e.g., increase memory from 4 GiB to 8 GiB by writing to /sys/fs/cgroup/memory/.../limit_in_bytes).
Performance and Cost Impact (Nov 2015 – Jun 2016)
Single‑node QPS increased by ~85 % (reduced virtualization overhead).
Average CPU/memory utilization dropped 30‑60 % due to dynamic scaling.
MySQL instance provisioning time reduced from ~3 minutes (VM) to ~30 seconds (container).
Resource limits can be adjusted instantly via cgroups, removing the need for VM recreation.
Future Work
Planned directions include mixing online and offline workloads on the same cluster to improve overall utilization, extending container usage to the full development lifecycle (dev → test → prod), and enhancing resource‑aware scheduling algorithms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
