Operations 15 min read

Boost Microservice Resilience with ChaosBlade and SkyWalking: A Hands‑On Guide

This article explains how to use ChaosBlade for fault injection and SkyWalking for monitoring to improve the high‑availability of distributed microservice systems, covering tool installation, experiment design, step‑by‑step execution, and real‑world case studies with detailed commands and metrics.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Boost Microservice Resilience with ChaosBlade and SkyWalking: A Hands‑On Guide

Chaos engineering injects controlled faults into distributed systems to reveal weaknesses and improve reliability. This guide demonstrates using the open‑source tools ChaosBlade (fault injection) and Apache SkyWalking (observability) in a microservice demo.

Tool Overview

ChaosBlade provides a unified blade CLI for injecting faults such as CPU load, memory pressure, network loss, disk I/O, process termination, Java/C++ method delays, Docker container actions, and Kubernetes node disruptions. It is lightweight, non‑intrusive, and extensible.

SkyWalking is an APM system that offers distributed tracing, metrics, service topology, root‑cause analysis, and alerting for cloud‑native architectures.

Installation

## Download
wget https://chaosblade.oss-cn-hangzhou.aliyuncs.com/agent/github/0.9.0/chaosblade-0.9.0-linux-amd64.tar.gz
## Extract
tar -zxf chaosblade-0.9.0-linux-amd64.tar.gz
## Add to PATH
export PATH=$PATH:chaosblade-0.9.0/
## Verify
blade -h

Use blade -h to list commands and explore sub‑commands (e.g., blade create cpu fullload -h) for flags and examples.

Chaos Experiment Workflow

Define a chaos experiment plan.

Specify steady‑state metrics (e.g., average response time, P99 latency) in SkyWalking.

Formulate fault‑tolerance hypotheses (e.g., timeout settings, circuit‑breaker policies).

Execute the experiment with ChaosBlade.

Validate metrics after fault injection.

Record results, restore the system, and fix identified issues.

Automate continuous verification.

Case Study 1 – Dubbo Cart Service Delay

Microservice demo includes frontend, cart, product, order, etc., built with SpringBoot, Nacos, MySQL, Redis, Lettuce, and Dubbo.

Generate load: ab -n 10000 -c 2 http://127.0.0.1:8083/cart Steady‑state: average RT ≈ 15 ms, P99 ≤ 20 ms (observed in SkyWalking).

Hypothesis: a 2 s client timeout and a circuit‑breaker should prevent long‑lasting blocks.

Inject a 30 s delay into Dubbo method viewCart:

blade create dubbo delay --time 30000 \
  --service com.alibabacloud.hipstershop.cartserviceapi.service.CartService \
  --methodname viewCart --process frontend --consumer

SkyWalking shows RT spikes to ~2000 ms, P99 rises similarly, and the /cart endpoint returns timeout errors.

Conclusion: timeout works, but no circuit‑breaker is configured, violating the hypothesis.

Case Study 2 – Network Loss on Nacos Registry

Simulate a registration‑center failure by injecting 100 % packet loss on Nacos port 8848:

blade create network loss --interface eth0 --percent 100 --local-port 8848

Metrics show the cart service remains functional because it caches data locally and has weak dependency on the registry, confirming the hypothesis.

Mini‑Case – MySQL Slow‑SQL Injection

Delay SELECT statements on MySQL to test slow‑SQL alerts:

blade create mysql delay --time 10000 --sqltype select --port 3306

This adds a 10 s delay to SELECT queries on port 3306, allowing verification of alerting behavior.

Repository

ChaosBlade source code: https://github.com/chaosblade-io/chaosblade

distributed-systemsMonitoringFault InjectionskywalkingChaosBlademicroservice resilience
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.