Resilience Patterns: Retry, Fallback, Timeout, and Circuit Breaker with Vert.x and Kotlin
This article explains software resilience concepts, introduces the four latency‑control patterns—retry, fallback, timeout, and circuit breaker—illustrates them with a payment‑fraud‑check example, and shows how to implement them in Vert.x using Kotlin code.
What Is Resilience?
Software is not an end in itself; it supports business processes and satisfies customers. Production software must be correct, reliable, and available. In resilient software design the main goal is to build robust components that can tolerate their own failures as well as failures of the components they depend on.
Even simple web applications consist of web servers, databases, firewalls, proxies, load balancers, and caches, all of which can fail. Besides total failures, services may respond slowly or return semantically incorrect results, and the more components a system has, the higher the chance of a fault.
Availability is a key quality attribute, expressed as the proportion of time a component is actually usable versus the time it should be usable.
Traditional approaches aim to increase uptime, while modern approaches focus on reducing recovery time (MTTR) to minimise downtime. Uwe Friedrichsen classifies resilience design patterns into four categories: loose coupling, isolation, latency control, and supervision.
Patterns
Example Scenario
To demonstrate the patterns we use a simple payment service that calls a fraud‑check service via HTTP. A successful call returns a 200 response with a boolean indicating fraud. What happens if the fraud service returns an internal‑server‑error (500) or does not respond?
Retry
The retry pattern re‑issues a failed request a configurable number of times before marking the operation as failed. It is useful for transient network issues, internal errors in the target service, or slow/unresponsive services due to high load.
Packet loss or temporary network glitches
Internal errors in the target service (e.g., database outage)
High request volume causing slow or missing responses
Be careful not to exacerbate overload‑induced failures; combine retry with exponential back‑off or a circuit breaker.
Fallback
The fallback pattern provides an alternative value when a downstream service fails, allowing the calling service to continue processing. In the payment example, a fallback might assume the transaction is not fraudulent, which can be risky.
A well‑chosen fallback balances risk and availability, such as allowing a small percentage of transactions to pass while flagging the rest for manual review.
Timeout
Timeout limits how long a client waits for a response before treating the request as failed. It prevents indefinite hangs but introduces challenges: a timed‑out order may have actually succeeded, leading to duplicate processing or customer confusion.
Choosing an appropriate timeout requires balancing slow legitimate responses against the need to abort never‑ending waits.
Circuit Breaker
A circuit breaker protects a service from repeatedly invoking a failing downstream service. It has three states: closed (requests flow normally), open (requests are rejected immediately), and half‑open (a test request determines whether to close the circuit).
When the fraud‑check service returns two consecutive 500 errors, the circuit opens; after a cooldown period it moves to half‑open, allowing a single request to decide whether to close or reopen.
Circuit breakers work well together with retry, timeout, and fallback to avoid cascading failures and denial‑of‑service conditions.
Implementation in Vert.x
Vert.x provides a CircuitBreaker class that can be configured with CircuitBreakerOptions . The following Kotlin snippet creates a circuit breaker that retries twice, opens after one failure, stays open for 5000 ms, and times out after 2000 ms.
val vertx = Vertx.vertx()
val options = circuitBreakerOptionsOf(
fallbackOnFailure = false,
maxFailures = 1,
maxRetries = 2,
resetTimeout = 5000,
timeout = 2000
)
val circuitBreaker = CircuitBreaker.create("my-circuit-breaker", vertx, options)To execute a command, provide an asynchronous handler for the operation and a handler for the result:
circuitBreaker.executeCommand(
Handler
> { it.complete("OK") },
Handler { println(it) }
)Vert.x also supports coroutine‑based suspend functions and offers advanced features such as event‑bus notifications, metrics for Hystrix dashboards, and custom state‑change callbacks.
Alternative Implementations
Not all frameworks ship built‑in resilience patterns. Libraries like Hystrix (now in maintenance mode), Resilience4j, and Failsafe provide programmatic APIs, while service‑mesh solutions like Istio implement resilience at the infrastructure level via sidecars.
Sidecars keep resilience logic separate from business code, making configuration changes easier, but they cannot implement all patterns (e.g., bulkhead isolation) and may rely on the application to supply sensible fallback values.
Conclusion
Loose coupling, isolation, latency control, and supervision together improve system resilience. Retry handles recoverable communication errors, fallback provides local degradation, timeout caps latency, and circuit breakers prevent overload cascades. Frameworks such as Vert.x offer ready‑made patterns, while dedicated libraries and service meshes give additional flexibility; the best solution depends on the team’s context and requirements.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.