How Alibaba’s OpenKruise Enhances Kubernetes Deployments for Massive Scale
The article recaps a SIG Cloud‑Provider‑Alibaba webinar, detailing Alibaba’s large‑scale cloud‑native deployment challenges and how the open‑source OpenKruise project introduces extended workloads like CloneSet, Advanced StatefulSet, SidecarSet, and more to improve deployment efficiency, stability, and flexibility on Kubernetes.
Background
Kubernetes native workloads (Deployment, StatefulSet) struggle with Alibaba-scale applications that may involve hundreds of thousands of Pods and millions of containers. Traditional rollouts cause long deployment times, network disruptions, and resource waste.
OpenKruise Extended Workloads
Alibaba open‑sourced a set of enhanced workloads under the OpenKruise project:
CloneSet : Replacement for Deployment with in‑place upgrades, configurable deletion, ordered rollout, and parallel/gray‑scale publishing.
Advanced StatefulSet : Drop‑in upgrade of StatefulSet adding in‑place upgrades, parallel rollout via maxUnavailable, and pause capabilities.
SidecarSet : Centralized sidecar definition and upgrade, decoupled from application workloads.
UnitedDeployment : Deploys an application across multiple zones using multiple subset workloads.
BroadcastJob : Runs a Job on every Node that matches a selector.
Key Features
In‑Place Upgrade
CloneSet and Advanced StatefulSet support three upgrade strategies: ReCreate: Full pod recreation (default behavior). InPlaceIfPossible: Updates only image and metadata fields in‑place; other changes fall back to recreation. InPlaceOnly: Allows only image/metadata changes, enforcing in‑place upgrade.
In‑place upgrades keep the pod sandbox and other containers running, reducing rollout time by up to 80% in large‑scale environments.
Rolling + Batch Publishing
Advanced StatefulSet introduces maxUnavailable for parallel pod upgrades. CloneSet combines maxSurge, maxUnavailable, and a partition field for fine‑grained batch control. Example spec:
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
replicas: 5
updateStrategy:
type: InPlaceIfPossible
maxSurge: 20%
maxUnavailable: 0
partition: 3This creates one extra pod, performs in‑place upgrades while keeping five pods available, and retains three old‑version pods as defined by partition.
Configurable Rollout Order
Both CloneSet and Advanced StatefulSet allow custom rollout priority via:
Label‑based ordering ( orderPriority).
Weight‑based selection ( weightPriority).
Scatter strategy to distribute pods with specific labels across batches.
Useful for stateful services such as Zookeeper, where non‑leader nodes must be upgraded before the leader.
Sidecar Management
SidecarSet separates sidecar containers from application specs, enabling centralized injection, version upgrades, and scaling without modifying each workload.
Open‑Source Availability
All workloads are available in the OpenKruise GitHub repository and are used internally for over a million containers across more than 100,000 applications. Future releases will include Advanced DaemonSet, HPA enhancements, and scheduler plugins.
Q&A Highlights
Scale : Single‑application pod counts can reach tens of thousands; rollout duration depends on batch size and may span weeks.
Resource Requests/Limits : Online services typically use a 1:1 request‑to‑limit ratio; batch jobs may use request > limit.
Version Upgrade Path : Kruise resources share a unified API version; upcoming upgrades will use conversion webhooks.
Go Client : Available via github.com/openkruise/kruise/pkg/client or via controller‑runtime with github.com/openkruise/kruise-api.
Kubernetes Cluster Upgrade : Alibaba employs a “Kube‑on‑Kube” architecture where a meta‑cluster manages thousands of tenant clusters, upgrading workloads similarly to normal application upgrades.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
