Cloud Native 21 min read

How to Prevent Cascading Deletions and Keep Cloud‑Native Apps Stable with OpenKruise

This article explains the inherent security risks of cloud‑native Kubernetes deployments—such as workload, namespace, and CRD cascading deletions and concurrent pod updates—and presents practical OpenKruise‑based protection techniques like label‑driven cascade‑deletion blocking, pod‑deletion flow control, and automatic PUB/PDB generation to ensure runtime stability.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How to Prevent Cascading Deletions and Keep Cloud‑Native Apps Stable with OpenKruise

Cloud‑Native Application Security Risks

Because Kubernetes follows a declarative, end‑state model, it enables powerful automation but also amplifies accidental mis‑operations. Deleting a workload, namespace, or CRD can cascade and wipe out many Pods, while concurrent controllers may cause 100% pod unavailability during rolling updates.

1. Cascading Deletion Risks

Workload level: Deleting a Deployment, CloneSet, or StatefulSet without an orphan strategy removes all its Pods.

Namespace level: Deleting a Namespace removes every resource inside it, including all workloads and services.

CRD level: Removing a CRD that backs a workload (e.g., CloneSet) also deletes all custom resources, potentially wiping out all Pods across the cluster.

2. Concurrent Pod Update / Eviction Risks

When multiple controllers (e.g., CloneSet and SidecarSet) act on the same Pods simultaneously, each may respect its own maxUnavailable setting, but the combined effect can exceed the intended availability, leading to full service outage.

OpenKruise Protection Practices

1. Prevent Cascading Deletion

Apply a special label policy.kruise.io/disable-cascading-deletion: "true" to CRDs, Namespaces, Deployments, and CloneSets. Kruise checks the label and blocks deletion if dependent resources still exist.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  labels:
    policy.kruise.io/disable-cascading-deletion: "true"
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    policy.kruise.io/disable-cascading-deletion: "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    policy.kruise.io/disable-cascading-deletion: "true"
---
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
metadata:
  labels:
    policy.kruise.io/disable-cascading-deletion: "true"

2. Pod Deletion Flow Control & Circuit Breaker

Define a PodDeletionFlowControl CRD to limit the number of Pods that can be deleted within specific time windows, with optional whitelist selectors.

apiVersion: policy.kruise.io/v1alpha1
kind: PodDeletionFlowControl
metadata:
  # ...
spec:
  limitRules:
  - interval: 10m
    limit: 100
  - interval: 1h
    limit: 500
  - interval: 24h
    limit: 5000
  whiteListSelector:
    matchExpressions:
    - key: xxx
      operator: In
      values:
      - foo

3. Application‑Level Unavailability Protection (PUB)

OpenKruise introduces PodUnavailableBudget (PUB) which extends the native PodDisruptionBudget to also block pod deletions, in‑place upgrades, sidecar injections, and container restarts. Example:

apiVersion: policy.kruise.io/v1alpha1
kind: PodUnavailableBudget
spec:
  targetRef:
    apiVersion: apps.kruise.io
    kind: CloneSet
    name: app-xxx
  maxUnavailable: 25%

With maxUnavailable: 25% on a CloneSet of 20 Pods, at most 5 Pods may be unavailable; any operation that would exceed this limit is rejected.

4. Automatic PUB/PDB Generation

When a workload is created, a controller can automatically generate a matching PUB (or native PDB) based on the workload’s maxUnavailable strategy. Users can enable it via annotations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-foo
  annotations:
    policy.kruise.io/generate-pub: "true"
    policy.kruise.io/generate-pub-maxUnavailable: "20%"
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
---
# auto‑generated PUB
apiVersion: policy.kruise.io/v1alpha1
kind: PodUnavailableBudget
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-foo
  maxUnavailable: 20%

OpenKruise Overview

OpenKruise is an open‑source Kubernetes extension that provides enhanced workloads such as CloneSet, Advanced StatefulSet, SidecarSet, UnitedDeployment, BroadcastJob, Advanced DaemonSet, and AdvancedCronJob. It addresses native workload limitations by offering in‑place upgrades, fine‑grained rollout control, sidecar management, and multi‑zone deployment.

2021 Roadmap Highlights

Risk control features: cascade‑deletion protection, global pod‑deletion flow control, pod‑deletion/eviction/in‑place‑upgrade protection, automatic PUB/PDB generation.

Kruise‑daemon: a node‑level daemon set for image pre‑warming, rollout acceleration, container restart, and scheduling optimizations.

ControllerMesh: a framework to manage multiple controllers in a cluster with traffic‑shaping and isolation.

OpenKruise has been accepted into the CNCF Sandbox and is used by many large‑scale production environments (e.g., Alibaba, Ctrip, OPPO, Lyft).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeKubernetesSecurityOpenKruisePodUnavailableBudgetcascading deletion
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.