Cloud Native 13 min read

How Alibaba Scaled Kubernetes for Double‑11: Practical Cloud‑Native Lessons

This article details Alibaba's four‑stage journey to migrate its core systems to Kubernetes, the operational challenges of legacy workflows, and the concrete cloud‑native transformations—final‑state deployment, self‑healing, and immutable infrastructure—that enabled a flawless Double‑11 sales event.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Alibaba Scaled Kubernetes for Double‑11: Practical Cloud‑Native Lessons

Background and Migration Stages

In 2019 Alibaba migrated 100% of its core systems to a cloud‑native architecture, successfully supporting the massive Double‑11 shopping festival. The migration unfolded in four phases: (1) R&D and exploration in late 2017, (2) initial gray‑scale deployment in late 2018 jointly with Ant Financial, (3) cloud‑gray verification in early 2019, and (4) large‑scale rollout after the 618 event, culminating in all core applications running on Kubernetes.

Why Embrace Kubernetes?

Legacy operational practices—manual ticket‑driven container changes, fragmented workflows, and rigid scaling—could not meet the scale and reliability demands of Double‑11. Alibaba realized that Kubernetes should not be an end in itself but a lever to overhaul the entire operations model, eliminating inconsistencies and unlocking cloud elasticity.

Face‑to‑Final‑State Transformation

Traditional PaaS workflows required a ticket for each container image update or deletion, leading to partial failures, manual retries, and conflict between concurrent changes. Kubernetes workloads provide a declarative API; controllers continuously reconcile the actual pod state with the desired spec, guaranteeing that the final state is achieved without external ticketing. The kubelet also retries pod launches internally, decoupling retry logic from PaaS ticket status.

Self‑Healing Capability

Previously, container platforms only provisioned resources while PaaS handled application start‑up and service discovery, creating a fragile coupling. By embedding start‑up commands and lifecycle hooks directly in pod specifications and exposing services via Service objects, Kubernetes unifies resource provisioning, application launch, and discovery. PodDisruptionBudget (PDB) is used to define safe eviction limits, enabling graceful self‑healing without precise capacity calculations.

Immutable Infrastructure Transformation

Docker introduced immutable images, and Kubernetes reinforced this by performing rolling updates that create new pods instead of mutating existing ones. Multi‑container pods allow independent upgrades of sidecar components (e.g., logging agents, service‑mesh proxies). However, the default rolling update deletes the entire pod, coupling component upgrades. Alibaba built an in‑place container upgrade controller that modifies only the targeted container within a pod, replacing the default Deployment and StatefulSet controllers. Additional controllers such as SidecarSet enable coordinated sidecar upgrades across applications.

OpenKruise Open‑Source Project

The custom controllers were open‑sourced as the OpenKruise project. OpenKruise provides advanced workload controllers, richer release strategies (canary, blue‑green, phased rollouts), and the in‑place upgrade capability described above, encapsulating Alibaba’s years of deployment experience for the broader community.

Conclusion and Future Work

Alibaba’s large‑scale Kubernetes deployment proved resilient under real‑world Double‑11 traffic, demonstrating that deep operational refactoring—final‑state deployment, self‑healing, and immutable infrastructure—is essential for cloud‑native success. Future efforts will focus on stateful‑application migration and end‑to‑end delivery‑pipeline cloud‑native transformation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesDevOpsOpenKruiseimmutable infrastructure
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.