Inject Real‑World Failures on KubeSphere Nodes with ChaosBlade Operator
This article explains how to deploy the ChaosBlade Operator on a KubeSphere cluster, define and run various chaos experiments—such as CPU overload, network latency, packet loss, and process kill/stop—against RadonDB MySQL containers, and verify the impact to improve system resilience and fault‑tolerance.
What is Chaos Engineering?
Chaos Engineering is a discipline that conducts experiments on distributed systems to build confidence in their ability to withstand uncontrolled conditions in production, originating from Netflix's Chaos Monkey project.
What is ChaosBlade Operator?
ChaosBlade is an open‑source chaos‑engineering tool from Alibaba that follows the chaos‑experiment model. The ChaosBlade Operator implements this model as a Kubernetes Custom Resource Definition (CRD), allowing experiments to be created, updated, and deleted using standard Kubernetes resources and tools such as kubectl or the ChaosBlade CLI.
Supported Experiment Scenarios
Inject CPU load on specific nodes.
Introduce network latency and packet loss on selected nodes.
Kill or stop processes (e.g., MySQL) on target nodes.
Deploying ChaosBlade Operator
Before running experiments, install the ChaosBlade Operator via Helm:
helm install kube-system/chaosblade-operator-1.2.0-v3.tgz
helm install chaosblade-operator chaosblade-operator-1.2.0-v3.tgz --namespace chaosbladeAfter deployment, verify that the chaosblade-tool and chaosblade-operator pods are in Running state:
kubectl get pod -n chaosblade -o wide | grep chaosbladeTest Environment
The experiments are performed on a KubeSphere cluster (8 CPU, 16 GB RAM, 500 GB disk, 4 nodes) with RadonDB MySQL containers deployed. The cluster parameters are summarized in the text.
Experiment 1: CPU Load
Goal: Raise CPU usage on node worker-s001 to 80%.
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: cpu-load
spec:
experiments:
- scope: node
target: cpu
action: fullload
desc: "increase node cpu load by names"
matchers:
- name: names
value:
- "worker-s001"
- name: cpu-percent
value: "80"
- name: ip
value: 192.168.0.20Apply the experiment and verify the load with top on the target node.
kubectl apply -f cpu-load.yamlExperiment 2: Network Latency
Goal: Add 3000 ms latency (±1000 ms) on node worker-s001.
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: delay-node-network
spec:
experiments:
- scope: node
target: network
action: delay
matchers:
- name: names
value:
- "worker-s001"
- name: latency
value: "3000"
- name: jitter
value: "1000"Apply and check the experiment status:
kubectl apply -f delay_node_network_by_names.yaml
kubectl get blade delay-node-network-by-names -o jsonVerify the increased latency using ping or telnet from the affected node.
Experiment 3: Network Packet Loss
Goal: Inject 100% packet loss on node worker-s001.
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: loss-node-network
spec:
experiments:
- scope: node
target: network
action: loss
matchers:
- name: names
value:
- "worker-s001"
- name: loss
value: "100"Apply and verify that connections to the Guestbook service from the affected node fail, while connections from other nodes succeed.
kubectl apply -f loss_node_network_by_names.yaml
kubectl get blade loss-node-network-by-names -o jsonExperiment 4: Kill MySQL Process
Goal: Delete the MySQL process on a specific node.
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: kill-node-process
spec:
experiments:
- scope: node
target: process
action: kill
matchers:
- name: names
value:
- "worker-s001"
- name: process
value: "mysqld"Apply the experiment, then SSH into the node and check that the MySQL PID changes, indicating the process was killed and restarted.
kubectl apply -f kill_node_process_by_names.yaml
kubectl get blade kill-node-process-by-names -o jsonExperiment 5: Stop MySQL Process
Goal: Suspend the MySQL process on a target node.
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: stop-node-process
spec:
experiments:
- scope: node
target: process
action: stop
matchers:
- name: names
value:
- "worker-s001"
- name: process
value: "mysqld"Apply and verify the process is stopped (PID changes) while the service becomes unavailable from the affected node.
kubectl apply -f stop_node_process_by_names.yaml
kubectl get blade stop-node-process-by-names -o jsonCleanup
After each test, delete the experiment resources:
kubectl delete -f <experiment>.yaml
kubectl delete blade <experiment-name>Conclusion
Using ChaosBlade Operator on KubeSphere nodes enables simple yet powerful fault‑injection experiments. By combining different scenarios, teams can validate the stability and availability of Kubernetes clusters, quickly locate failure sources, and improve overall system resilience.
Qingyun Technology Community
Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
