Simulating CPU and I/O Failures with Bash Scripts for Chaos Engineering
This article demonstrates how to create Bash scripts that fully saturate CPU and I/O resources, explains their role in fault injection within the Simian Army framework, and introduces the broader concepts and benefits of chaos engineering for building resilient distributed systems.
Fault Simulation
This article explains how to simulate CPU and I/O saturation on a Linux host. To max out the CPU, a simple CPU‑intensive script is used; to max out I/O, a continuous write loop is employed.
#!/bin/bash
cat << EOF > /tmp/infiniteburn.sh
#!/bin/bash
while true; do
do openssl speed;
done
EOF
for i in {1..32}
do
nohup /bin/bash /tmp/infiniteburn.sh &
doneThe script creates /tmp/infiniteburn.sh, which runs an infinite loop invoking openssl speed, a CPU‑bound operation. The outer loop launches the script in the background 32 times, matching a machine with up to 32 CPU cores.
To saturate I/O, a similar approach is used with the dd command:
#!/bin/bash
# Script for BurnIO Chaos Monkey
cat << EOF > /tmp/loopburnio.sh
#!/bin/bash
while true; do
dd if=/dev/urandom of=/burn bs=1M count=1024 iflag=fullblock
done
EOF
nohup /bin/bash /tmp/loopburnio.sh &The script continuously reads random data and writes 1 MiB blocks to /burn, repeating 1024 times per iteration, thereby fully loading the I/O subsystem.
Why Use These Scripts?
The two scripts are implementations of fault injection techniques from Netflix's Simian Army. While many are familiar with Chaos Monkey, which randomly terminates EC2 instances, Simian Army expands the concept to a whole suite of “monkeys” that inject various failures.
Chaos Gorilla – simulates an entire Availability Zone failure.
Chaos Kong – simulates a whole region outage.
Latency Monkey – adds artificial latency to REST calls.
Conformity Monkey – shuts down instances that violate best‑practice rules.
Doctor Monkey – isolates unhealthy instances based on health checks.
Janitor Monkey – reclaims unused resources.
Security Monkey – scans for security misconfigurations and expired certificates.
10‑18 Monkey – validates localization and internationalization settings.
Chaos Engineering
Chaos engineering conducts controlled experiments on distributed systems to discover hidden weaknesses before they cause production outages. By deliberately injecting failures—such as CPU, I/O, network latency, or instance termination—engineers can verify that services remain resilient and recover automatically.
Principles of Chaos: Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production.
This article starts with CPU and I/O stress scripts, then introduces the broader chaos engineering mindset. Future posts will cover network latency, packet loss, process hangs, and other failure modes, illustrating that fault injection is just one facet of a comprehensive chaos engineering practice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
