How Alibaba Scaled Kubernetes for Double‑11: Practical Cloud‑Native Lessons
This article details Alibaba's four‑stage journey to migrate its core systems to Kubernetes, the operational challenges of legacy workflows, and the concrete cloud‑native transformations—final‑state deployment, self‑healing, and immutable infrastructure—that enabled a flawless Double‑11 sales event.
Background and Migration Stages
In 2019 Alibaba migrated 100% of its core systems to a cloud‑native architecture, successfully supporting the massive Double‑11 shopping festival. The migration unfolded in four phases: (1) R&D and exploration in late 2017, (2) initial gray‑scale deployment in late 2018 jointly with Ant Financial, (3) cloud‑gray verification in early 2019, and (4) large‑scale rollout after the 618 event, culminating in all core applications running on Kubernetes.
Why Embrace Kubernetes?
Legacy operational practices—manual ticket‑driven container changes, fragmented workflows, and rigid scaling—could not meet the scale and reliability demands of Double‑11. Alibaba realized that Kubernetes should not be an end in itself but a lever to overhaul the entire operations model, eliminating inconsistencies and unlocking cloud elasticity.
Face‑to‑Final‑State Transformation
Traditional PaaS workflows required a ticket for each container image update or deletion, leading to partial failures, manual retries, and conflict between concurrent changes. Kubernetes workloads provide a declarative API; controllers continuously reconcile the actual pod state with the desired spec, guaranteeing that the final state is achieved without external ticketing. The kubelet also retries pod launches internally, decoupling retry logic from PaaS ticket status.
Self‑Healing Capability
Previously, container platforms only provisioned resources while PaaS handled application start‑up and service discovery, creating a fragile coupling. By embedding start‑up commands and lifecycle hooks directly in pod specifications and exposing services via Service objects, Kubernetes unifies resource provisioning, application launch, and discovery. PodDisruptionBudget (PDB) is used to define safe eviction limits, enabling graceful self‑healing without precise capacity calculations.
Immutable Infrastructure Transformation
Docker introduced immutable images, and Kubernetes reinforced this by performing rolling updates that create new pods instead of mutating existing ones. Multi‑container pods allow independent upgrades of sidecar components (e.g., logging agents, service‑mesh proxies). However, the default rolling update deletes the entire pod, coupling component upgrades. Alibaba built an in‑place container upgrade controller that modifies only the targeted container within a pod, replacing the default Deployment and StatefulSet controllers. Additional controllers such as SidecarSet enable coordinated sidecar upgrades across applications.
OpenKruise Open‑Source Project
The custom controllers were open‑sourced as the OpenKruise project. OpenKruise provides advanced workload controllers, richer release strategies (canary, blue‑green, phased rollouts), and the in‑place upgrade capability described above, encapsulating Alibaba’s years of deployment experience for the broader community.
Conclusion and Future Work
Alibaba’s large‑scale Kubernetes deployment proved resilient under real‑world Double‑11 traffic, demonstrating that deep operational refactoring—final‑state deployment, self‑healing, and immutable infrastructure—is essential for cloud‑native success. Future efforts will focus on stateful‑application migration and end‑to‑end delivery‑pipeline cloud‑native transformation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
