Rate Limiting, Circuit Breaking, and Service Degradation: Key Fault‑Tolerance Patterns for Distributed Systems
The article explains why distributed systems need fault‑tolerance mechanisms such as rate limiting, circuit breaking, and service degradation, describes common metrics (TPS, HPS, QPS), outlines several limiting algorithms (counter, sliding window, leaky bucket, token bucket, distributed and Hystrix‑based), and discusses circuit‑breaker states, considerations, and practical Hystrix usage.
1 Rate Limiting
In distributed systems a faulty or slow service can block callers, exhaust resources, and cause a cascade failure (system avalanche). Proper rate‑limiting improves overall fault tolerance.
1.1 Rate‑Limiting Metrics
1.1.1 TPS
Transactions per second is a natural metric, but in practice a single transaction may involve many services and take a long time, making TPS too coarse‑grained.
1.1.2 HPS
Hits per second (requests received per second) measures raw request volume.
❝If a request completes a transaction, TPS and HPS are equivalent, but in distributed scenarios they differ because a transaction may span multiple requests.❞
1.1.3 QPS
Queries per second counts how many client queries the server can answer per second.
❝With a single server, HPS and QPS are the same, but in distributed setups each request may involve many servers, so they are not interchangeable.❞
1.2 Rate‑Limiting Methods
1.2.1 Counter
The simplest method limits the number of requests per second, e.g., reject any request beyond 100 per second.
Problem 1: Controlling the exact time window (e.g., 1 s) is difficult.
Problem 2: Short spikes may not require limiting, yet the counter would reject them.
1.2.2 Sliding Time Window
The sliding‑window algorithm treats time as a moving window. For a 1 s window with a limit of 50 requests, the sum of requests in t1~t5 must not exceed 250 . When the window slides to t2~t6 , the oldest slice is dropped and the newest added.
Advantages: solves the counter’s granularity problem. Drawbacks: still needs to drop traffic or degrade when the limit is exceeded, and cannot smooth short‑term spikes.
1.2.3 Leaky Bucket
The leaky‑bucket buffers incoming requests in a fixed‑size queue and releases them at a steady rate, preventing burst traffic from overwhelming the service.
Issues to consider: bucket size, output rate, and increased response latency.
1.2.4 Token Bucket
Clients must obtain a token before sending a request; tokens are replenished periodically. This algorithm combines burst tolerance with a steady rate and is widely used (e.g., Google Guava).
1.2.5 Distributed Rate Limiting
When the token bucket is stored centrally (e.g., in redis ), every service in a distributed call chain must interact with it, increasing latency. A common improvement is to acquire a batch of tokens before invoking the composite service and then share them among downstream calls.
1.2.6 Hystrix Rate Limiting
1.2.6.1 Semaphore Limiting
@HystrixCommand(
commandProperties = {
@HystrixProperty(name = "execution.isolation.strategy", value = "SEMAPHORE"),
@HystrixProperty(name = "execution.isolation.semaphore.maxConcurrentRequests", value = "20")
},
fallbackMethod = "errMethod"
)1.2.6.2 Thread‑Pool Limiting
@HystrixCommand(
commandProperties = {
@HystrixProperty(name = "execution.isolation.strategy", value = "THREAD")
},
threadPoolKey = "createOrderThreadPool",
threadPoolProperties = {
@HystrixProperty(name = "coreSize", value = "20"),
@HystrixProperty(name = "maxQueueSize", value = "100"),
@HystrixProperty(name = "maximumSize", value = "30"),
@HystrixProperty(name = "queueSizeRejectionThreshold", value = "120")
},
fallbackMethod = "errMethod"
)❝In Java thread pools, if the number of threads exceeds coreSize , requests go to the queue; when the queue is full, new threads are created up to maximumSize . However, Hystrix adds queueSizeRejectionThreshold . If this threshold is lower than maxQueueSize , the queue will reject requests before reaching maximumSize .❞
2 Circuit Breaking
A circuit breaker acts like a fuse: when failures exceed a threshold, it opens to stop traffic, preventing further damage.
2.1 Circuit‑Breaker States
CLOSED : normal operation; failure rate below threshold.
OPEN : failures exceed threshold; requests are short‑circuited.
HALF OPEN : after a timeout, a limited number of requests are allowed to test recovery; success returns to CLOSED, failure goes back to OPEN.
2.2 Considerations
Define different fallback logic for different exception types.
Set a break‑time; after it expires the breaker moves to HALF OPEN for retry.
Log failures for monitoring.
Active retry (e.g., network detection via telnet ) before reopening.
Provide a manual compensation interface for operators.
When retrying, ensure idempotency of the original request.
2.3 Use Cases
Service outage or upgrade – fast failure for callers.
Easy definition of failure handling logic.
Long read timeouts that could cause massive retries.
3 Service Degradation
Degradation is a global‑view strategy applied after a circuit opens, routing non‑critical requests to fallback paths.
3.1 Use Cases
Return error directly for non‑critical services.
Cache the request and return an intermediate response, retry later.
Disable non‑core features during traffic spikes.
Serve cached data when DB pressure is high.
Convert heavy write operations to asynchronous processing.
Temporarily stop batch jobs to save resources.
3.2 Hystrix Degradation
3.2.1 Exception Degradation
Use @HystrixCommand with ignoreExceptions to let specific exceptions bypass fallback.
@HystrixCommand(
fallbackMethod = "errMethod",
ignoreExceptions = {ParamErrorException.class, BusinessTypeException.class}
)3.2.2 Timeout Degradation
Define a timeout (e.g., 3000 ms) after which the call falls back.
@HystrixCommand(
commandProperties = {
@HystrixProperty(name = "execution.timeout.enabled", value = "true"),
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "3000")
},
fallbackMethod = "errMethod"
)Conclusion
Rate limiting, circuit breaking, and service degradation are essential fault‑tolerance patterns. Rate limiting protects core services from overload, while circuit breaking and degradation safeguard non‑core functionality. Choosing the right algorithm (token bucket is often preferred) and configuring thresholds based on load tests are critical, and these settings should be stored in a configuration center for dynamic updates.
Reference
[1] Microservices: a definition of this new architectural term – https://time.geekbang.org/column/article/312390
[2] Hystrix thread‑pool pitfalls – https://zhuanlan.zhihu.com/p/161522189
[3] CircuitBreaker – https://martinfowler.com/bliki/CircuitBreaker.html
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.