Mastering Rate Limiting in Spring Cloud Gateway: Algorithms, Implementations, and Best Practices
This article explores the evolution of Spring Cloud Gateway, explains common rate‑limiting scenarios and algorithms, reviews open‑source libraries such as Guava, Bucket4j and Resilience4j, and provides detailed guidance for implementing both local and distributed request‑frequency and concurrency limits within the gateway.
Before Spring Cloud Gateway, Netflix Zuul was the default gateway in Spring Cloud, but its blocking API and lack of WebSocket support led the community to create a reactive, non‑blocking alternative built on Spring Framework 5, Spring Boot 2 and Project Reactor.
Spring Cloud Gateway’s key features include integration with Spring Cloud DiscoveryClient, Hystrix circuit breaker, easy predicate and filter definitions, request rate limiting, and path rewriting.
Common Rate‑Limiting Scenarios
Rate limiting, together with caching and degradation, forms the "three horsemen" of high‑concurrency systems; it controls request rates to improve resilience during traffic spikes such as flash sales or ticket‑booking bursts.
Limiting Targets
Maximum 100 requests per minute for a specific API.
Maximum download speed of 100 KB/s per user.
Maximum 5 concurrent requests per user for an endpoint.
Block all requests from a particular IP.
Typical limiting objects are request frequency (rate limiting) and concurrent request count (concurrency limiting).
Handling Strategies
Reject the request (e.g., HTTP 429).
Queue the request for later processing.
Provide fallback data (service degradation).
Limiting Architecture
Two deployment modes exist: single‑instance (in‑memory) and cluster (centralized component such as a gateway or Redis). The gateway layer can perform access‑level limiting, while middleware (e.g., Redis, Hazelcast, Ignite) can provide distributed limiting.
Common Limiting Algorithms
Fixed Window
A simple counter per time window; easy to implement with AtomicLong, LongAdder or Redis INCR / EXPIRE. The main drawback is the “boundary problem” where bursts can double the effective rate.
Sliding Window
Divides a large window into smaller sub‑windows and sums their counters, providing smoother limiting at the cost of additional computation.
Leaky Bucket
Queues requests and processes them at a fixed rate, visualized as water leaking from a bucket; useful for smoothing bursty traffic.
Token Bucket
Generates tokens at a fixed rate up to a capacity; each request consumes a token, allowing bursts when tokens have accumulated.
public class TokenBucket {
private final long capacity;
private final double refillTokensPerOneMillis;
private double availableTokens;
private long lastRefillTimestamp;
public TokenBucket(long capacity, long refillTokens, long refillPeriodMillis) {
this.capacity = capacity;
this.refillTokensPerOneMillis = (double) refillTokens / (double) refillPeriodMillis;
this.availableTokens = capacity;
this.lastRefillTimestamp = System.currentTimeMillis();
}
public synchronized boolean tryConsume(int numberTokens) {
refill();
if (availableTokens < numberTokens) {
return false;
} else {
availableTokens -= numberTokens;
return true;
}
}
private void refill() {
long currentTimeMillis = System.currentTimeMillis();
if (currentTimeMillis > lastRefillTimestamp) {
long millisSinceLastRefill = currentTimeMillis - lastRefillTimestamp;
double refill = millisSinceLastRefill * refillTokensPerOneMillis;
this.availableTokens = Math.min(capacity, availableTokens + refill);
this.lastRefillTimestamp = currentTimeMillis;
}
}
}Open‑Source Rate‑Limiter Projects
Guava RateLimiter
Implements a token‑bucket with smooth bursty and warm‑up modes.
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire());Bucket4j
Provides both in‑memory and distributed token‑bucket implementations using JCache‑compatible stores.
Bucket bucket = Bucket4j.builder().addLimit(limit).build();
if (bucket.tryConsume(1)) {
System.out.println("ok");
} else {
System.out.println("error");
}Resilience4j
Offers a RateLimiter (token‑bucket) and Bulkhead (semaphore or thread‑pool) for concurrency limiting.
RateLimiterConfig cfg = RateLimiterConfig.custom()
.limitForPeriod(1)
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ofMillis(100))
.build();
RateLimiter limiter = RateLimiter.of("backend", cfg);Implementing Rate Limiting in Spring Cloud Gateway
Local (single‑instance) request‑frequency limiting
Implement the RateLimiter interface and use a KeyResolver (e.g., IP‑based) to identify the limiting key.
public interface RateLimiter<C> extends StatefulConfigurable<C> {
Mono<RateLimiter.Response> isAllowed(String routeId, String id);
}Distributed request‑frequency limiting
Spring Cloud Gateway provides RedisRateLimiter backed by a Lua script that atomically updates tokens.
local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
-- token bucket logic omitted for brevity
return { allowed_num, new_tokens }Local concurrency limiting
Use Resilience4j’s Bulkhead (semaphore) to restrict simultaneous executions.
BulkheadConfig bulkheadConfig = BulkheadConfig.custom()
.maxConcurrentCalls(150)
.maxWaitTime(100)
.build();
Bulkhead bulkhead = Bulkhead.of("backend", bulkheadConfig);Distributed concurrency limiting
Approaches include Redis‑based distributed semaphores (e.g., Redisson RSemaphore) or per‑instance counters with TTL, as well as a custom "dual‑window sliding" algorithm that keeps only the current and previous minute windows in Redis for efficient MGET checks.
Conclusion
Rate limiting is essential for gateway stability; understanding scenarios, algorithms, and available libraries enables developers to choose the right strategy—whether in‑memory, Redis‑backed, or hybrid—to meet both request‑frequency and concurrency requirements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
