Designing Rate Limiting and Circuit Breaking for Microservices and API Gateways
This article explains the motivations, resource granularity, rule definition, calculation logic, and implementation steps for building effective rate‑limiting and circuit‑breaking mechanisms in microservice architectures and API gateways, emphasizing sliding‑window statistics and decoupled interceptor design.
In modern microservice systems, high‑concurrency API calls can exhaust server threads and memory, leading to service degradation or JVM crashes. To prevent a single API from causing a cascade failure, rate limiting and circuit breaking are essential.
Problem background : When an API experiences massive concurrent requests or large payloads, it can monopolize resources, causing thread‑pool exhaustion and memory overflow. This can trigger a snowball effect where dependent services also fail.
Basic concepts : Rate limiting queues requests and caps the number of concurrent threads, while circuit breaking makes the entire service unavailable once a threshold is breached.
Resource granularity : Three levels are considered – the finest granularity (consumer + API + provider), the circuit‑break layer (API + provider), and the circuit‑break scope (provider only). Proper granularity is needed for rule configuration and real‑time data aggregation.
Rule definition : Rules consist of a resource, a metric (e.g., request count, latency, error rate), a time window, and a threshold. Simple rules use a single metric comparison; composite rules combine multiple conditions with logical AND/OR.
Computation logic : Data is collected in short intervals (e.g., 10 seconds), aggregated, and pushed into a sliding‑window array. A second aggregation over the configured time window (5 min, 10 min, etc.) determines whether the rule is triggered.
Implementation flow :
Match incoming service instances to configured resources and store them in a temporary cache.
Perform first‑level aggregation and push results into the sliding window.
Execute second‑level aggregation based on rule definitions.
Decide whether to activate rate limiting or circuit breaking.
Example rules:
CRM getCustomer API: block if calls > 10 k in 10 min.
Product info API: trigger circuit break if error rate > 1 % in 5 min.
ERP services: circuit break if average latency > 30 s in 1 min.
Each rule requires its own temporary storage and sliding window to avoid interference.
Decoupling from API gateway : The rate‑limiting/circuit‑breaking logic is implemented as an independent interceptor that checks rule status before allowing a request. If a rule is active, the request is rejected; otherwise it proceeds. This design keeps the gateway lightweight and enables independent scaling.
After a service is circuit‑broken, a cooldown timer (e.g., 5‑10 min) can be set, and a scheduled task will re‑evaluate the service to restore normal traffic once conditions improve.
Overall, the article provides a practical blueprint for building fine‑grained, rule‑driven rate limiting and circuit breaking that can be integrated with any API gateway or microservice framework.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.