Backend Development 13 min read

Designing Rate Limiting and Circuit Breaking for Microservices and API Gateways

The article explains the concepts, problem scenarios, resource granularity, rule definition, sliding‑window calculation, and implementation flow of rate limiting and circuit breaking in microservice architectures and API gateways, providing practical guidance for building robust backend services.

Top Architect
Top Architect
Top Architect
Designing Rate Limiting and Circuit Breaking for Microservices and API Gateways

Today we discuss rate limiting and circuit breaking in microservice architectures and API gateways, noting that frameworks like Spring Cloud provide Hystrix and products like Kong include these capabilities, while open‑source solutions such as Alibaba Sentinel can be integrated with Dubbo or Spring Cloud.

Rather than re‑introducing each open‑source product, the article examines the problem from a business‑scenario perspective, covering resource definition, thread‑pool isolation, and sliding‑window calculations.

Problem and Background

High‑concurrency or large‑payload API calls can exhaust server threads and memory, leading to thread‑pool saturation or JVM OOM, and causing a single API to monopolize resources and trigger a cascade failure (snowball effect) across dependent services.

Basic Concepts of Rate Limiting and Circuit Breaking

Rate limiting queues requests and limits the number of concurrent threads, while circuit breaking makes an entire service unavailable when certain thresholds are crossed.

Overall Implementation Approach

The solution considers three resource granularity levels: (1) API consumer + API service + provider, (2) service + provider (circuit‑break layer), and (3) provider only (circuit‑break scope). Rules are defined on these granularities.

Rules consist of dimensions such as service latency, request count per unit time, and data volume, with thresholds that trigger actions when exceeded. Composite rules can combine multiple conditions using logical AND/OR.

Rule and Resource Matching Logic

Incoming service instances are stored in temporary buffers, aggregated every minimal interval (e.g., 10 seconds), and pushed into a sliding‑time‑window array. A second aggregation over the window evaluates whether configured rules are satisfied, then decides to apply rate limiting or circuit breaking.

Decoupling from API Gateway

Rate limiting and circuit breaking are implemented as independent interceptors that evaluate rule activation before allowing or rejecting requests, enabling clean separation from the API‑gateway implementation.

BackendmicroservicesAPI GatewayRate LimitingSliding Windowcircuit breakingresource granularity
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.