Cloud Native 16 min read

How Alibaba’s OpenKruise Enhances Kubernetes Deployments for Massive Scale

The article recaps a SIG Cloud‑Provider‑Alibaba webinar, detailing Alibaba’s large‑scale cloud‑native deployment challenges and how the open‑source OpenKruise project introduces extended workloads like CloneSet, Advanced StatefulSet, SidecarSet, and more to improve deployment efficiency, stability, and flexibility on Kubernetes.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Alibaba’s OpenKruise Enhances Kubernetes Deployments for Massive Scale

Background

Kubernetes native workloads (Deployment, StatefulSet) struggle with Alibaba-scale applications that may involve hundreds of thousands of Pods and millions of containers. Traditional rollouts cause long deployment times, network disruptions, and resource waste.

OpenKruise Extended Workloads

Alibaba open‑sourced a set of enhanced workloads under the OpenKruise project:

CloneSet : Replacement for Deployment with in‑place upgrades, configurable deletion, ordered rollout, and parallel/gray‑scale publishing.

Advanced StatefulSet : Drop‑in upgrade of StatefulSet adding in‑place upgrades, parallel rollout via maxUnavailable, and pause capabilities.

SidecarSet : Centralized sidecar definition and upgrade, decoupled from application workloads.

UnitedDeployment : Deploys an application across multiple zones using multiple subset workloads.

BroadcastJob : Runs a Job on every Node that matches a selector.

Key Features

In‑Place Upgrade

CloneSet and Advanced StatefulSet support three upgrade strategies: ReCreate: Full pod recreation (default behavior). InPlaceIfPossible: Updates only image and metadata fields in‑place; other changes fall back to recreation. InPlaceOnly: Allows only image/metadata changes, enforcing in‑place upgrade.

In‑place upgrades keep the pod sandbox and other containers running, reducing rollout time by up to 80% in large‑scale environments.

Rolling + Batch Publishing

Advanced StatefulSet introduces maxUnavailable for parallel pod upgrades. CloneSet combines maxSurge, maxUnavailable, and a partition field for fine‑grained batch control. Example spec:

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
  replicas: 5
  updateStrategy:
    type: InPlaceIfPossible
    maxSurge: 20%
    maxUnavailable: 0
    partition: 3

This creates one extra pod, performs in‑place upgrades while keeping five pods available, and retains three old‑version pods as defined by partition.

Configurable Rollout Order

Both CloneSet and Advanced StatefulSet allow custom rollout priority via:

Label‑based ordering ( orderPriority).

Weight‑based selection ( weightPriority).

Scatter strategy to distribute pods with specific labels across batches.

Useful for stateful services such as Zookeeper, where non‑leader nodes must be upgraded before the leader.

Sidecar Management

SidecarSet separates sidecar containers from application specs, enabling centralized injection, version upgrades, and scaling without modifying each workload.

Open‑Source Availability

All workloads are available in the OpenKruise GitHub repository and are used internally for over a million containers across more than 100,000 applications. Future releases will include Advanced DaemonSet, HPA enhancements, and scheduler plugins.

Q&A Highlights

Scale : Single‑application pod counts can reach tens of thousands; rollout duration depends on batch size and may span weeks.

Resource Requests/Limits : Online services typically use a 1:1 request‑to‑limit ratio; batch jobs may use request > limit.

Version Upgrade Path : Kruise resources share a unified API version; upcoming upgrades will use conversion webhooks.

Go Client : Available via github.com/openkruise/kruise/pkg/client or via controller‑runtime with github.com/openkruise/kruise-api.

Kubernetes Cluster Upgrade : Alibaba employs a “Kube‑on‑Kube” architecture where a meta‑cluster manages thousands of tenant clusters, upgrading workloads similarly to normal application upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesOpenKruiseLarge‑Scale DeploymentSidecarSetAdvanced StatefulSetCloneSet
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.