Rate Limiting in Spring Cloud Gateway: Scenarios, Algorithms, Open‑Source Tools, and Practical Implementations
This article comprehensively explains rate‑limiting concepts for Spring Cloud Gateway, covering common throttling scenarios, major algorithms such as Fixed Window, Sliding Window, Leaky Bucket and Token Bucket, reviews popular open‑source libraries, and demonstrates both single‑node and distributed implementations with detailed code examples.
1. Introduction
The author, a senior architect, shares a systematic study of request‑rate limiting for Spring Cloud Gateway, describing why throttling is essential for high‑traffic systems and how it fits into micro‑service architectures.
2. Common Limiting Scenarios
Interface call count limits (e.g., 100 requests per minute)
Download speed caps (e.g., 100 KB/s per user)
Concurrent request caps per user or IP
IP‑wide black‑listing
These scenarios translate into two main objects: request‑frequency limiting and concurrent‑request limiting.
3. Typical Limiting Algorithms
3.1 Fixed Window
Counts requests in a discrete time bucket; simple but suffers from boundary spikes.
3.2 Sliding Window
Divides a larger window into smaller sub‑windows and aggregates their counters for smoother control.
3.3 Leaky Bucket
Queues incoming requests and drains them at a fixed rate, smoothing burst traffic.
3.4 Token Bucket
Generates tokens at a steady rate; each request consumes a token, allowing bursts when the bucket is full.
public class TokenBucket {
private final long capacity;
private final double refillTokensPerOneMillis;
private double availableTokens;
private long lastRefillTimestamp;
// constructor and methods omitted for brevity
}4. Open‑Source Rate‑Limiting Projects
Guava RateLimiter (token‑bucket based, smooth burst & warm‑up)
Bucket4j (token‑bucket, supports distributed caches)
Resilience4j (rate‑limiter & bulkhead for concurrency control)
Each library’s usage pattern is illustrated with concise code snippets.
// Guava RateLimiter example
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire()); // Bucket4j bucket creation
Bucket bucket = Bucket4j.builder()
.addLimit(Bandwidth.simple(10, Duration.ofMinutes(1)))
.build(); // Resilience4j bulkhead & rate‑limiter composition
Bulkhead bulkhead = Bulkhead.of("backend", BulkheadConfig.custom().maxConcurrentCalls(150).build());
RateLimiter rateLimiter = RateLimiter.of("backend", RateLimiterConfig.custom().limitForPeriod(1).limitRefreshPeriod(Duration.ofSeconds(1)).build());5. Implementing Limiting in Spring Cloud Gateway
5.1 Single‑Node Request‑Frequency Limiting
Gateway defines a RateLimiter interface; a local implementation can use Resilience4j or Bucket4j.
public interface RateLimiter<C> extends StatefulConfigurable<C> {
Mono<RateLimiter.Response> isAllowed(String routeId, String id);
}5.2 Distributed Request‑Frequency Limiting
Spring Cloud Gateway already provides RedisRateLimiter, which executes a Lua script atomically in Redis.
local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
-- script body omitted for brevity
return { allowed_num, new_tokens }5.3 Single‑Node Concurrent Limiting
Uses Resilience4j bulkhead or Java Semaphore to cap simultaneous executions.
Semaphore semaphore = new Semaphore(10);
semaphore.acquire();
// process request
semaphore.release();5.4 Distributed Concurrent Limiting
Approaches include TTL‑based counters in Redis, per‑instance keys, or a custom “dual‑window sliding” algorithm that keeps only the current and previous minute windows in Redis for atomic MGET checks.
// Dual‑window sliding algorithm sketch (pseudo‑code)
String curKey = "gw:cnt:" + currentMinute();
String prevKey = "gw:cnt:" + previousMinute();
Long cur = redis.get(curKey);
Long prev = redis.get(prevKey);
long total = (cur == null ? 0 : cur) + (prev == null ? 0 : prev);
if (total > limit) reject(); else allow();6. Summary
Rate limiting is a cornerstone of gateway stability; the article walks through scenarios, classic algorithms, useful libraries, and concrete implementations for both single‑node and distributed environments, while also highlighting pitfalls such as token‑bucket granularity limits and exception‑safe semaphore release.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
