Cloud Native 12 min read

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

Chaos Mesh is an open‑source cloud‑native chaos engineering platform that lets you experiment with fault injection across Kubernetes environments, offering visual dashboards, extensive fault types, and step‑by‑step installation and experiment creation guides to help teams uncover system weaknesses and improve resilience.

GrowingIO Tech Team

Dec 2, 2021

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

What is Chaos Testing?

Chaos testing is an experimental, system‑based approach to handling chaos in large‑scale distributed systems. By continuously experimenting, teams discover resilience limits and build confidence, using fault injection to expose weaknesses early.

Chaos Mesh Overview

Chaos Mesh is an open‑source cloud‑native chaos engineering platform that provides rich fault simulation types and powerful scenario orchestration, with a visual dashboard for easy experiment design and monitoring.

Key Advantages

Proven core capability: Originated from TiDB’s testing platform.

Widely adopted: Used by companies like Tencent, Meituan, and integrated with projects such as Apache APISIX and RabbitMQ.

Ease of use: Graphical UI and Kubernetes‑native operation.

Cloud‑native: Native support for Kubernetes.

Comprehensive fault scenarios: Covers most basic fault types in distributed testing.

Flexible experiment orchestration: Users can design multi‑step chaos workflows and add health checks.

High security: Multi‑layer security controls.

Active community: CNCF incubated project.

Extensible: Easy to add new fault types and features.

Architecture Overview

Chaos Mesh is built on Kubernetes CRDs. It consists of three main components:

Chaos Dashboard: Web UI for creating, managing, and observing experiments, with RBAC support.

Chaos Controller Manager: Core logic that schedules and manages experiments via various controllers (Workflow, Scheduler, fault‑specific controllers).

Chaos Daemon: DaemonSet that runs with privileged rights (optional) and injects faults into target pods (network, filesystem, kernel, etc.).

The workflow proceeds from user actions in the Dashboard, which create or modify Chaos CRD resources, through the Kubernetes API server to the Controller Manager, and finally to the Daemon that injects the actual fault.

Fault Injection Types

Chaos Mesh categorizes faults into three groups:

Infrastructure faults: PodChaos, NetworkChaos, DNSChaos, HTTPChaos, StressChaos, IOChaos, TimeChaos, KernelChaos.

Platform faults: AWSChaos, GCPChaos.

Application‑level faults: JVMChaos.

Visualization and Security

The Chaos Dashboard provides a visual interface for experiment management and result inspection. Security is enforced via Kubernetes RBAC; users create Roles and ServiceAccounts, bind them, and obtain tokens to limit experiment permissions. Namespace annotations can further restrict chaos experiments.

Installation and Deployment

Example uses a Minikube Kubernetes cluster. Install Minikube (e.g., via VirtualBox), then install kubectl matching the cluster version. Deploy Chaos Mesh following the official manifests.

Creating Experiments

Via YAML

Example network-delay.yaml defines a 12‑second network latency fault targeting pods with label app=web-show in the default namespace. Apply with kubectl apply -f network-delay.yaml and monitor with kubectl describe.

Via Dashboard

To simulate CPU load, create a new experiment, select “Stress Test”, specify worker count and load percentage, choose target pods via label selector, and submit. The dashboard then launches a stress‑ng‑cpu process inside the target pods.

Conclusion

Chaos Mesh offers a systematic way to discover system fragilities through controlled fault injection, enabling teams to build more resilient, high‑availability distributed systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

observability Kubernetes chaos engineering Fault Injection Chaos Mesh

Written by

GrowingIO Tech Team

The official technical account of GrowingIO, showcasing our tech innovations, experience summaries, and cutting‑edge black‑tech.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.