Build a Production‑Ready Retry + Circuit Breaker Decorator in Python
This article shows how to create a Python decorator that automatically retries flaky API calls with exponential backoff and activates a circuit‑breaker to prevent cascading failures, providing a lightweight, zero‑intrusion solution for stable automated testing pipelines.
In API automated testing, intermittent failures such as 502 Bad Gateway or timeouts often cause flaky test results. Manually rerunning tests is inefficient, so a programmatic solution that automatically retries and protects downstream services is needed.
Core Goals
Automatic retry for recoverable errors (default 3 attempts).
Exponential backoff (1 s → 2 s → 4 s) to avoid overwhelming the service.
Circuit‑breaker protection: after N consecutive failures the circuit opens and rejects further calls.
Automatic recovery: after a cooldown period (e.g., 30 s) the circuit half‑opens and tests the service again.
Zero intrusion: apply a single decorator to test functions without changing business logic.
Circuit‑Breaker States
The breaker has three states:
Closed : normal operation, failures are counted.
Open : failure threshold exceeded, all requests are rejected immediately.
Half‑Open : after the recovery timeout, a single trial request is allowed; success returns to Closed , failure goes back to Open .
Implementation
import time
import functools
from datetime import datetime, timedelta
from typing import Callable, Any
# Global storage for circuit‑breaker instances (keyed by function name)
_circuit_breakers = {}
class CircuitBreaker:
def __init__(self, failure_threshold=3, recovery_timeout=30):
self.failure_threshold = failure_threshold # failures to trigger open
self.recovery_timeout = recovery_timeout # seconds to stay open
self.failure_count = 0
self.last_failure_time = None
self.state = "CLOSED" # CLOSED, OPEN, HALF_OPEN
def can_execute(self) -> bool:
if self.state == "CLOSED":
return True
if self.state == "OPEN":
if datetime.now() - self.last_failure_time > timedelta(seconds=self.recovery_timeout):
self.state = "HALF_OPEN"
return True
return False
if self.state == "HALF_OPEN":
return True
return False
def record_success(self):
self.failure_count = 0
self.state = "CLOSED"
def record_failure(self):
self.failure_count += 1
self.last_failure_time = datetime.now()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
def retry_with_circuit_breaker(
max_retries: int = 3,
delay: float = 1.0,
backoff: float = 2.0,
exceptions=(TimeoutError, ConnectionError, Exception),
failure_threshold: int = 3,
recovery_timeout: int = 30,
):
"""Decorator that adds automatic retry + circuit‑breaker to a function.
:param max_retries: maximum retry attempts
:param delay: initial delay in seconds
:param backoff: multiplier for exponential backoff
:param exceptions: exception types that trigger a retry
:param failure_threshold: failures needed to open the circuit
:param recovery_timeout: seconds the circuit stays open
"""
def decorator(func: Callable) -> Callable:
func_key = f"{func.__module__}.{func.__name__}"
@functools.wraps(func)
def wrapper(*args, **kwargs) -> Any:
if func_key not in _circuit_breakers:
_circuit_breakers[func_key] = CircuitBreaker(
failure_threshold=failure_threshold,
recovery_timeout=recovery_timeout,
)
cb = _circuit_breakers[func_key]
if not cb.can_execute():
raise Exception(f"Circuit breaker is open ({func_key}), try later")
current_delay = delay
last_exception = None
for attempt in range(max_retries + 1):
try:
result = func(*args, **kwargs)
cb.record_success()
return result
except exceptions as e:
last_exception = e
cb.record_failure()
if attempt < max_retries and cb.state != "OPEN":
print(f"⚠️ {func_key} attempt {attempt + 1} failed: {e}, retry after {current_delay}s")
time.sleep(current_delay)
current_delay *= backoff
else:
break
raise last_exception
return wrapper
return decoratorUsage Example
import requests
@retry_with_circuit_breaker(
max_retries=2,
delay=1,
backoff=2,
exceptions=(requests.Timeout, requests.ConnectionError, ValueError),
failure_threshold=2,
recovery_timeout=20,
)
def get_user_info(user_id: int):
resp = requests.get(f"https://api.example.com/users/{user_id}", timeout=3)
if resp.status_code >= 500:
raise ValueError(f"Server error: {resp.status_code}")
resp.raise_for_status()
return resp.json()
def test_fetch_user():
user = get_user_info(123)
assert user["id"] == 123Simulation Scenario
Assume get_user_info(123) is unstable:
T+0 s – first request returns 502; decorator retries after 1 s.
T+1 s – second request times out; retry after 2 s.
T+3 s – third request again returns 502; failure threshold reached, circuit opens.
T+10 s – a new request is blocked immediately with “circuit breaker is open”.
T+35 s – after the recovery timeout the circuit half‑opens, the next request succeeds and the circuit returns to Closed .
In CI/CD this means early flaky failures are isolated, preventing a cascade of red tests.
Advanced Tips
Fine‑grained circuit breaking per URL by including the URL in func_key.
Integrate logging/metrics (e.g., Prometheus, ELK) inside record_failure for observability.
Support async functions by detecting iscoroutinefunction and returning an async wrapper.
Combine with pytest‑rerunfailures for whole‑test retries; this decorator focuses on individual API calls.
Precautions
Do not retry non‑idempotent operations such as payments or SMS sending.
Choose sensible thresholds: too low triggers frequent opens, too high reduces protection.
The provided implementation stores state in‑process; for multi‑process or distributed environments use a shared store like Redis.
Conclusion
Adding this decorator to your test utilities gives automatic retry with exponential backoff, fast‑fail circuit breaking, and self‑healing recovery, turning flaky API tests into stable, reliable checks with just one line of code.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
