Client‑Side Circuit Breaking Strategies: State Machine, Google SRE Breaker, and Mitigation Techniques
This article explains why client‑side circuit breaking is essential, describes common state‑machine and Google SRE breaker strategies, provides practical pseudocode, and discusses mitigation methods such as Gutter mode, jittered exponential backoff, and graceful degradation to protect system stability.
Preface: A severe incident caused by mis‑estimated execution time led to a service outage during peak hours. Controlling traffic with degradation could have limited the impact.
We will share how rate limiting protects systems, focusing on client‑side circuit breaking (often called "circuit breaker").
Why Implement Circuit Breaking
Circuit breaking allows a client to proactively drop requests based on metric feedback, preventing cascading failures when downstream services, gateways, or networks become unhealthy.
It safeguards client health and reduces downstream pressure (e.g., HTTP, MySQL, Redis).
1. State‑Machine Circuit Breaker
The server has two states: closed (normal) and open (circuit broken). After a cooldown period, it transitions to half‑open for probing.
Basic parameters:
Silent period (request volume threshold)
Cooldown period (time to stop calling the downstream service)
Statistical window (sliding window with buckets)
Break criteria:
Slow‑call ratio
Error ratio
Error count
Implementation details are left to the business to define what constitutes an error (e.g., non‑200 HTTP codes, response body checks, latency).
Core pseudocode:
//-----某请求进入-----
curStatus := b.CurrentState()
if curStatus == Closed {
//熔断状态 Closed 直接返回成功
return true
} else if curStatus == Open {
// 过禁闭期后, 状态切换为 half-open
if b.retryTimeoutArrived() && b.fromOpenToHalfOpen(ctx) {
return true
}
}
//-----某请求完成-----
curStatus := b.CurrentState()
if curStatus == Open {
//熔断状态开启状态, 直接返回
return
}
if curStatus == HalfOpen {
if err == nil {
//探测请求通过,关闭熔断状态
b.fromHalfOpenToClosed()
b.resetMetric()
} else {
//探测请求未通过, 恢复熔断状态
b.fromHalfOpenToOpen(1)
}
return
}2. Google SRE Breaker
Google SRE proposes a probabilistic rejection formula based on request and accept counts, with a tunable factor K to control tolerated error rates.
Core pseudocode:
//获取窗口内的成功处理数与总请求数
accepts, total := b.summary()
//容错概率因子 * 成功处理数
requests := b.k * float64(accepts)
//求拒绝概率 (requests-k*accepts) / (requests+1)
dr := math.Max(0, (float64(total)-requests)/float64(total+1))
//取随机数与取得概率进行对比
r := rand.New(rand.NewSource(time.Now().UnixNano())).Float64()
if r < dr {
//发生限流
return false
}
//流量通过
return true3. Reducing the Impact of Circuit Breaking
Common mitigation techniques include:
Gutter mode: failover to a standby cluster with limited capacity.
Jittered exponential backoff: retry with exponential delays and random jitter to spread load.
Exponential backoff with jitter helps avoid traffic spikes during retries.
4. Rate‑Limiting Configuration
Rate‑limiting can be applied at interface or application level, using classifications from Google SRE such as CRITICAL_PLUS , CRITICAL , SHEDDABLE_PLUS , and SHEDDABLE to prioritize traffic.
Clients should propagate importance levels via headers so downstream services can apply appropriate limits.
5. Graceful Degradation
When a service is unavailable, degrade gracefully by reducing response quality or computational effort, e.g., returning less precise recommendations.
Keep degradation logic simple and well‑tested.
Summary
Effective circuit breaking and rate limiting require rich metrics, observability, and continuous tuning. As systems grow, combining client‑side protection with downstream safeguards and proper classification of request criticality is essential for maintaining reliability.
References:
Circuit breaking concepts: https://sentinelguard.io/zh-cn/docs/golang/circuit-breaking.html
State‑machine implementation: https://github.com/alibaba/sentinel-golang
Google SRE breaker implementation: https://github.com/go-kratos/aegis/tree/main/circuitbreaker
Exponential backoff and jitter: https://aws.amazon.com/cn/blogs/architecture/exponential-backoff-and-jitter/
Google SRE handling overload: https://sre.google/sre-book/handling-overload/
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.