Exploring ChaosBlade: Alibaba’s Open‑Source Chaos Engineering Platform for Cloud‑Native Environments
ChaosBlade, Alibaba’s open‑source chaos engineering project now advancing through CNCF Sandbox, offers a comprehensive suite—including the chaosblade experiment tool and chaosblade‑box platform—to simulate over 200 scenarios across hosts, Kubernetes, and multi‑language applications, with automated deployment, extensible architecture, and enterprise adoption examples.
Project Overview
ChaosBlade is an open‑source chaos engineering project launched by Alibaba in 2019. It consists of the chaosblade experiment tool and the chaosblade‑box platform, aiming to help enterprises address high‑availability challenges in cloud‑native environments.
The experiment tool supports three system platforms, four programming languages, and more than 200 experiment scenarios with over 3,000 parameters, allowing fine‑grained control. The platform can host both its own tool and external tools such as LitmusChaos, and is already used by over 40 enterprises including Industrial and Commercial Bank of China, China Mobile, Xiaomi, and JD.com.
Core Capabilities
Rich experiment scenarios : cover basic resources (CPU, memory, network, disk, process, kernel, file), multi‑language services (Java, C++, NodeJS, Go), and Kubernetes resources (containers, pods, nodes).
Diverse execution methods : white‑box UI, the blade CLI, kubectl, or programmatic APIs.
Easy scenario extension : all scenarios follow a chaos experiment model with separate executors, making them simple to add.
Automated deployment : the tool can be deployed automatically on hosts or clusters without manual steps.
Hosted open‑source tools : the platform can manage mainstream tools like chaosblade and LitmusChaos.
Unified user interface : users operate experiments through a single UI regardless of the underlying tool.
Multi‑dimensional experiments : support experiments from host level to Kubernetes resources and application layer.
Cloud‑native integration : deployed via Helm, integrated with Prometheus monitoring, and compatible with other cloud‑native components.
Architecture Design
The chaosblade‑box architecture (illustrated below) enables automated deployment of hosted tools, unified experiment modeling, and resource targeting through a target manager. Experiments are executed by invoking the appropriate chaos engine, and metrics are collected via Prometheus for future reporting.
Deployment instructions are available at https://github.com/chaosblade-io/chaosblade-box/releases.
Customer Cases
More than 40 enterprises have adopted ChaosBlade, including major Chinese firms such as Industrial and Commercial Bank of China, China Mobile, Xiaomi, and JD.com, using it to improve system resilience.
Future Roadmap
ChaosBlade will continue to build on cloud‑native foundations, delivering a multi‑cluster, multi‑environment, multi‑language chaos engineering platform. The project will expand experiment scenario coverage, support additional Kubernetes resources, standardize multi‑language scenario implementations, simplify deployment, host more tools, provide scenario recommendations, integrate business and system monitoring, and generate comprehensive experiment reports.
Community contributions are welcomed to advance the field of chaos engineering and help enterprises achieve highly available distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
