Cloud Native 10 min read

What’s Next for OpenKruise? Exploring New Features and Risk‑Control Strategies

This article provides a comprehensive overview of OpenKruise, an open‑source cloud‑native application automation engine, detailing its core workloads, upcoming roadmap, risk‑control mechanisms, the new kruise‑daemon component, and the ControllerMesh design for scalable operator management.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
What’s Next for OpenKruise? Exploring New Features and Risk‑Control Strategies

Background

OpenKruise is an open‑source cloud‑native application automation engine maintained by Alibaba Cloud and hosted as a CNCF Sandbox project. It extends Kubernetes with a set of standard components that have been battle‑tested in Alibaba’s large‑scale production environments.

Feature Overview (Version Review)

CloneSet : Provides efficient, deterministic deployment with in‑place upgrades, configurable deletion, ordered and parallel/gray releases.

Advanced StatefulSet : Enhances native StatefulSet with in‑place upgrades, parallel rollout, and pause capabilities.

SidecarSet : Manages sidecar containers across Pods that match a selector.

UnitedDeployment : Deploys workloads across multiple availability zones using subsets.

BroadcastJob : Runs a Job on every Node that satisfies the selector.

Advanced DaemonSet : Adds gray‑scale rollout, node‑label selection, pause, and hot‑upgrade to native DaemonSet.

AdvancedCronJob : Extends CronJob to support Job or BroadcastJob templates.

Roadmap (Planning Overview)

In the first half of 2021, OpenKruise will focus on application risk control, operator runtime extensions, and daemon side‑car extensions. Future enhancements include an improved HPA and a no‑code controller, both still in the pipeline.

1. Risk Control

Automating final state can amplify accidental operations. Typical failure scenarios include accidental deletion of CRDs, cascade deletion of workloads, mis‑configured rollout strategies, erroneous node taints, and bulk Pod deletions. OpenKruise proposes several safeguards:

Define a "prevent cascade delete" label that blocks deletion of CRDs or workloads while dependent resources exist.

Introduce a Pod deletion flow‑control policy that limits the number of Pods that can be removed within a configurable time window (e.g., per minute, hour, day).

Add a custom resource PodUnavailableBudget (PUB) that, unlike native PDB, validates deletions, evictions, and in‑place upgrades against a defined unavailable‑Pod threshold.

Automatically generate PUB/PDB for each workload so users gain protection with minimal configuration changes.

2. kruise‑daemon

The traditional OpenKruise controller runs centrally and cannot intervene on the node side. The upcoming kruise‑daemon component will run on each node, enabling:

Image pre‑warming : Define NodeImage and ImagePullJob to preload images on specific nodes, accelerating subsequent in‑place upgrades.

Release acceleration : By pre‑warming images on nodes that will host upgraded Pods, the time spent pulling images during rollout is eliminated.

Container restart support : Allows restarting a container without changing its image, though full start/stop ordering still relies on Kubelet.

Node‑level scheduling optimization : Adjusts cgroup settings to maximize resource utilization and meet SLOs; this feature is experimental for 2021.

3. ControllerMesh

As the number of Controllers/Operators grows, single‑master deployments become bottlenecks, lack horizontal scaling, and cannot support gray‑rollouts or A/B testing. ControllerMesh proposes a sharding architecture that provides:

Horizontal scaling and graceful upgrades for Operators.

Fault injection, tenant isolation, and security hardening.

Unified observability, monitoring, tracing, and metrics.

Conclusion

OpenKruise now offers a rich set of workloads covering most common deployment scenarios. With the 2021 roadmap, it aims to move beyond workloads to broader cloud‑native automation, adding risk‑control, daemon‑side extensions, and a scalable ControllerMesh. The project welcomes contributions from the cloud‑native community.

References

GitHub: https://github.com/openkruise/kruise

Official site: https://openkruise.io/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesOpenKruiseWorkloads
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.