Cloud Native 19 min read

How Meituan Optimized Kubernetes at Scale: Lessons from HULK2.0

This article details Meituan‑Dianping's evolution from a custom Docker‑based cluster manager to the open‑source Kubernetes‑powered HULK2.0 platform, describing its architecture, operational practices, scheduler and Kubelet optimizations, and resource‑management techniques that enable massive, cost‑effective scaling.

21CTO
21CTO
21CTO
How Meituan Optimized Kubernetes at Scale: Lessons from HULK2.0

Background

Meituan‑Dianping, a leading Chinese life‑service platform, experiences pronounced traffic peaks during holidays and promotions, demanding highly elastic and available clusters while keeping operational costs under control.

The article introduces the company's Kubernetes cluster management practices, covering the evolution of its internal scheduling system (HULK) and subsequent optimizations.

Meituan‑Dianping Cluster Management and Scheduling System

Since 2013 the company built a virtualization‑based resource delivery model, launching the HULK system in 2015 to drive containerization. By 2016 a self‑developed Docker‑based elastic scaling solution improved delivery speed and reduced IT costs. In 2018 the platform migrated to Kubernetes, creating HULK2.0.

Architecture Overview

HULK2.0 decouples business layers from the underlying Kubernetes platform, exposing a unified HULK API that abstracts resource requests, while remaining compatible with native Kubernetes APIs.

Why Kubernetes?

Kubernetes offers a platform rather than a single solution, providing extensibility, mature ecosystem support, and flexible resource allocation, which aligns with Meituan‑Dianping’s need for rapid scaling and cost efficiency.

Cluster Operation Status

Scale: over 100,000 online instances across multiple regions.

Monitoring & alerts for applications, nodes, pods, and containers.

Automated health checks, daily host inspections, and resource visualizations.

Capacity planning using rule‑based and machine‑learning predictions.

Kubernetes Optimization and Refactoring

Kube‑Scheduler Performance Optimization

Upgrading from the 1.6 scheduler to newer versions eliminated a 5‑second per‑pod scheduling delay in a 3,000‑node cluster, achieving >400% performance improvement.

Pre‑filter Abort Mechanism

Introducing an early‑exit strategy during the predicate phase stops evaluating a node once any pre‑filter condition fails, dramatically reducing computation and boosting scheduler throughput.

This change was contributed to the Kubernetes community as the alwaysCheckAllPredicates option, becoming the default in version 1.10.

Local‑Optimal Scheduling

Instead of exhaustive BestFit across all nodes, the platform selects a subset (e.g., 100 nodes) and chooses the highest‑scoring node within that subset, achieving comparable performance with far less computation.

Kubelet Refactoring

Risk Control

The team limited Kubelet’s autonomous eviction and restart behaviors, adding a reusable restart strategy that preserves container data across host reboots.

IP Retention

A custom CNI plugin reuses pod IPs after migration or host restart, improving stability.

Scalability Enhancements

Features added include NUMA binding, CPU share adjustments, CPUSet assignments, and extended container limits (ulimit, I/O, PID, swap).

In‑Place Application Upgrade

Implemented a mechanism to modify pod specifications (e.g., image) without recreating the pod, avoiding IP/hostname changes and reducing disruption.

Image Distribution Optimizations

Cross‑site synchronization for nearby image pulls.

Pre‑distribution of base images to all servers.

P2P image sharing to alleviate registry load.

Resource Management and Optimization

Key Techniques

Service profiling for CPU, memory, network, and I/O usage.

Affinity and anti‑affinity rules to co‑locate complementary workloads.

Scenario‑based priority (e.g., latency‑sensitive services).

Elastic scaling with rule‑based and ML‑driven policies.

Fine‑grained resource allocation (NUMA, CPUSet, etc.).

Strategy Optimization

Affinity and anti‑affinity constraints.

Application priority levels for resource contention.

Dispersal across hosts, racks, zones for fault tolerance.

Isolation for exclusive workloads.

Special resource handling for GPU, SSD, NICs.

Online Cluster Optimization

NUMA binding to reduce cross‑node latency.

CPUSet grouping of complementary applications.

Staggered workload peaks based on service profiles.

Rescheduling to improve placement and reduce fragmentation.

Interference analysis using monitoring metrics.

Conclusion

Meituan‑Dianping continues to explore mixed online‑offline deployments, intelligent scheduling aware of traffic and resource usage, and high‑performance, strongly isolated, secure container technologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationSchedulingCluster ManagementMeituan
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.