Designing Rate Limiting and Circuit Breaking for Microservices and API Gateways
The article explains the concepts, problem scenarios, resource granularity, rule definition, sliding‑window calculation, and implementation flow of rate limiting and circuit breaking in microservice architectures and API gateways, providing practical guidance for building robust backend services.
Today we discuss rate limiting and circuit breaking in microservice architectures and API gateways, noting that frameworks like Spring Cloud provide Hystrix and products like Kong include these capabilities, while open‑source solutions such as Alibaba Sentinel can be integrated with Dubbo or Spring Cloud.
Rather than re‑introducing each open‑source product, the article examines the problem from a business‑scenario perspective, covering resource definition, thread‑pool isolation, and sliding‑window calculations.
Problem and Background
High‑concurrency or large‑payload API calls can exhaust server threads and memory, leading to thread‑pool saturation or JVM OOM, and causing a single API to monopolize resources and trigger a cascade failure (snowball effect) across dependent services.
Basic Concepts of Rate Limiting and Circuit Breaking
Rate limiting queues requests and limits the number of concurrent threads, while circuit breaking makes an entire service unavailable when certain thresholds are crossed.
Overall Implementation Approach
The solution considers three resource granularity levels: (1) API consumer + API service + provider, (2) service + provider (circuit‑break layer), and (3) provider only (circuit‑break scope). Rules are defined on these granularities.
Rules consist of dimensions such as service latency, request count per unit time, and data volume, with thresholds that trigger actions when exceeded. Composite rules can combine multiple conditions using logical AND/OR.
Rule and Resource Matching Logic
Incoming service instances are stored in temporary buffers, aggregated every minimal interval (e.g., 10 seconds), and pushed into a sliding‑time‑window array. A second aggregation over the window evaluates whether configured rules are satisfied, then decides to apply rate limiting or circuit breaking.
Decoupling from API Gateway
Rate limiting and circuit breaking are implemented as independent interceptors that evaluate rule activation before allowing or rejecting requests, enabling clean separation from the API‑gateway implementation.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.