Cloud Native 12 min read

Inject Real‑World Failures on KubeSphere Nodes with ChaosBlade Operator

This article explains how to deploy the ChaosBlade Operator on a KubeSphere cluster, define and run various chaos experiments—such as CPU overload, network latency, packet loss, and process kill/stop—against RadonDB MySQL containers, and verify the impact to improve system resilience and fault‑tolerance.

Qingyun Technology Community
Qingyun Technology Community
Qingyun Technology Community
Inject Real‑World Failures on KubeSphere Nodes with ChaosBlade Operator

What is Chaos Engineering?

Chaos Engineering is a discipline that conducts experiments on distributed systems to build confidence in their ability to withstand uncontrolled conditions in production, originating from Netflix's Chaos Monkey project.

What is ChaosBlade Operator?

ChaosBlade is an open‑source chaos‑engineering tool from Alibaba that follows the chaos‑experiment model. The ChaosBlade Operator implements this model as a Kubernetes Custom Resource Definition (CRD), allowing experiments to be created, updated, and deleted using standard Kubernetes resources and tools such as kubectl or the ChaosBlade CLI.

Supported Experiment Scenarios

Inject CPU load on specific nodes.

Introduce network latency and packet loss on selected nodes.

Kill or stop processes (e.g., MySQL) on target nodes.

Deploying ChaosBlade Operator

Before running experiments, install the ChaosBlade Operator via Helm:

helm install kube-system/chaosblade-operator-1.2.0-v3.tgz
helm install chaosblade-operator chaosblade-operator-1.2.0-v3.tgz --namespace chaosblade

After deployment, verify that the chaosblade-tool and chaosblade-operator pods are in Running state:

kubectl get pod -n chaosblade -o wide | grep chaosblade

Test Environment

The experiments are performed on a KubeSphere cluster (8 CPU, 16 GB RAM, 500 GB disk, 4 nodes) with RadonDB MySQL containers deployed. The cluster parameters are summarized in the text.

Experiment 1: CPU Load

Goal: Raise CPU usage on node worker-s001 to 80%.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: cpu-load
spec:
  experiments:
  - scope: node
    target: cpu
    action: fullload
    desc: "increase node cpu load by names"
    matchers:
    - name: names
      value:
      - "worker-s001"
    - name: cpu-percent
      value: "80"
    - name: ip
      value: 192.168.0.20

Apply the experiment and verify the load with top on the target node.

kubectl apply -f cpu-load.yaml

Experiment 2: Network Latency

Goal: Add 3000 ms latency (±1000 ms) on node worker-s001.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: delay-node-network
spec:
  experiments:
  - scope: node
    target: network
    action: delay
    matchers:
    - name: names
      value:
      - "worker-s001"
    - name: latency
      value: "3000"
    - name: jitter
      value: "1000"

Apply and check the experiment status:

kubectl apply -f delay_node_network_by_names.yaml
kubectl get blade delay-node-network-by-names -o json

Verify the increased latency using ping or telnet from the affected node.

Experiment 3: Network Packet Loss

Goal: Inject 100% packet loss on node worker-s001.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: loss-node-network
spec:
  experiments:
  - scope: node
    target: network
    action: loss
    matchers:
    - name: names
      value:
      - "worker-s001"
    - name: loss
      value: "100"

Apply and verify that connections to the Guestbook service from the affected node fail, while connections from other nodes succeed.

kubectl apply -f loss_node_network_by_names.yaml
kubectl get blade loss-node-network-by-names -o json

Experiment 4: Kill MySQL Process

Goal: Delete the MySQL process on a specific node.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: kill-node-process
spec:
  experiments:
  - scope: node
    target: process
    action: kill
    matchers:
    - name: names
      value:
      - "worker-s001"
    - name: process
      value: "mysqld"

Apply the experiment, then SSH into the node and check that the MySQL PID changes, indicating the process was killed and restarted.

kubectl apply -f kill_node_process_by_names.yaml
kubectl get blade kill-node-process-by-names -o json

Experiment 5: Stop MySQL Process

Goal: Suspend the MySQL process on a target node.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: stop-node-process
spec:
  experiments:
  - scope: node
    target: process
    action: stop
    matchers:
    - name: names
      value:
      - "worker-s001"
    - name: process
      value: "mysqld"

Apply and verify the process is stopped (PID changes) while the service becomes unavailable from the affected node.

kubectl apply -f stop_node_process_by_names.yaml
kubectl get blade stop-node-process-by-names -o json

Cleanup

After each test, delete the experiment resources:

kubectl delete -f <experiment>.yaml
kubectl delete blade <experiment-name>

Conclusion

Using ChaosBlade Operator on KubeSphere nodes enables simple yet powerful fault‑injection experiments. By combining different scenarios, teams can validate the stability and availability of Kubernetes clusters, quickly locate failure sources, and improve overall system resilience.

cloud nativeKubernetesChaos EngineeringFault InjectionChaosBlade
Qingyun Technology Community
Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.