How to Add Chaos Monkey to Spring Boot Microservices for Resilient Systems

This guide walks through integrating Codecentric's Chaos Monkey into a Spring Boot microservice ecosystem, covering dependency setup, configuration, actuator endpoints, Dockerized MySQL, performance testing with Gatling, and timeout tuning for Feign and Ribbon clients to simulate real‑world failures.

Programmer DD
Programmer DD
Programmer DD
How to Add Chaos Monkey to Spring Boot Microservices for Resilient Systems

1. Enable Chaos Monkey

Add the chaos-monkey-spring-boot dependency to your pom.xml and start the application with the chaos-monkey profile.

<dependency>
  <groupId>de.codecentric</groupId>
  <artifactId>chaos-monkey-spring-boot</artifactId>
  <version>2.0.0-SNAPSHOT</version>
</dependency>

Run the jar with:

$ java -jar target/order-service-1.0-SNAPSHOT.jar --spring.profiles.active=chaos-monkey

2. Sample System Architecture

The example consists of three microservices (order, product, customer), each running two instances, plus a discovery server. All services register with the discovery server and communicate via HTTP. The Chaos Monkey library is included in every service instance.

Sample system architecture diagram
Sample system architecture diagram

3. Configuration

In application.yml enable attacks and set their parameters. The default attack is latency, ranging from 1000 ms to 10000 ms. You can also enable exceptions and the app‑killer.

chaos:
  monkey:
    assaults:
      level: 8
      latencyRangeStart: 1000
      latencyRangeEnd: 10000
      exceptionsActive: true
      killApplicationActive: true
    watcher:
      repository: true
      restController: true

Note that enabling latency and exceptions together prevents the kill‑application attack from occurring.

4. Enable Spring Boot Actuator Endpoint

Activate the Chaos Monkey actuator endpoint by setting management.endpoint.chaosmonkey.enabled=true and exposing it via the web endpoint list.

management:
  endpoint:
    chaosmonkey:
      enabled: true
  endpoints:
    web:
      exposure:
        include: health,info,chaosmonkey

5. Run the Applications

Start a MySQL container for persistence:

$ docker run -d --name mysql -e MYSQL_DATABASE=chaos -e MYSQL_USER=chaos -e MYSQL_PASSWORD=chaos123 -e MYSQL_ROOT_PASSWORD=123456 -p 33306:3306 mysql

Launch the discovery service and the two instances of each microservice on different ports.

$ java -jar target/discovery-service-1.0-SNAPSHOT.jar
$ java -jar target/order-service-1.0-SNAPSHOT.jar --spring.profiles.active=chaos-monkey -Dserver.port=8081
$ java -jar target/order-service-1.0-SNAPSHOT.jar --spring.profiles.active=chaos-monkey -Dserver.port=8082
... (similar commands for product‑service and customer‑service)
Running services diagram
Running services diagram

6. Performance Test with Gatling

A Gatling simulation creates 20 concurrent users, each performing 500 POST requests to /orders and subsequent GET requests.

class ApiGatlingSimulationTest extends Simulation {
  val scn = scenario("AddAndFindOrders")
    .repeat(500, "n") {
      exec(http("AddOrder-API")
        .post("http://localhost:8090/order-service/orders")
        .header("Content-Type", "application/json")
        .body(StringBody("""{\"productId\":${Random.nextInt(20)},\"customerId\":${Random.nextInt(20)},\"productsCount\":1,\"price\":1000,\"status\":\"NEW\"}""")).asJson)
        .check(status.is(200), jsonPath("$.id").saveAs("orderId")))
      .pause(5.milliseconds)
      .exec(http("GetOrder-API")
        .get("http://localhost:8090/order-service/orders/${orderId}")
        .check(status.is(200)))
    }
  setUp(scn.inject(atOnceUsers(20))).maxDuration(FiniteDuration(10, "minutes"))
}

The test shows average response times and error rates, illustrating the impact of injected latency.

Gatling average response time chart
Gatling average response time chart
Gatling timeline chart
Gatling timeline chart

7. Feign and Ribbon Timeout Configuration

Set client timeouts to 5000 ms to make some delayed requests time out, creating a roughly 50/50 success‑failure ratio.

feign:
  client:
    config:
      default:
        connectTimeout: 5000
        readTimeout: 5000
ribbon:
  ConnectTimeout: 5000
  ReadTimeout: 5000
hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 15000
      fallback:
        enabled: false
      circuitBreaker:
        enabled: false

8. Observations

The Gatling results indicate increased latency due to Chaos Monkey attacks and client timeout settings. The logs show the Chaos Monkey configuration for each service instance, confirming that the attacks are active during the performance run.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesPerformance TestingSpring BootChaos Monkey
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.