Mastering Rate Limiting in Spring Cloud Gateway: Algorithms, Implementations, and Best Practices

This article explores the evolution of Spring Cloud Gateway, explains common rate‑limiting scenarios and algorithms, reviews open‑source libraries such as Guava, Bucket4j and Resilience4j, and provides detailed guidance for implementing both local and distributed request‑frequency and concurrency limits within the gateway.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Mastering Rate Limiting in Spring Cloud Gateway: Algorithms, Implementations, and Best Practices

Before Spring Cloud Gateway, Netflix Zuul was the default gateway in Spring Cloud, but its blocking API and lack of WebSocket support led the community to create a reactive, non‑blocking alternative built on Spring Framework 5, Spring Boot 2 and Project Reactor.

Spring Cloud Gateway’s key features include integration with Spring Cloud DiscoveryClient, Hystrix circuit breaker, easy predicate and filter definitions, request rate limiting, and path rewriting.

Common Rate‑Limiting Scenarios

Rate limiting, together with caching and degradation, forms the "three horsemen" of high‑concurrency systems; it controls request rates to improve resilience during traffic spikes such as flash sales or ticket‑booking bursts.

Limiting Targets

Maximum 100 requests per minute for a specific API.

Maximum download speed of 100 KB/s per user.

Maximum 5 concurrent requests per user for an endpoint.

Block all requests from a particular IP.

Typical limiting objects are request frequency (rate limiting) and concurrent request count (concurrency limiting).

Handling Strategies

Reject the request (e.g., HTTP 429).

Queue the request for later processing.

Provide fallback data (service degradation).

Limiting Architecture

Two deployment modes exist: single‑instance (in‑memory) and cluster (centralized component such as a gateway or Redis). The gateway layer can perform access‑level limiting, while middleware (e.g., Redis, Hazelcast, Ignite) can provide distributed limiting.

Common Limiting Algorithms

Fixed Window

A simple counter per time window; easy to implement with AtomicLong, LongAdder or Redis INCR / EXPIRE. The main drawback is the “boundary problem” where bursts can double the effective rate.

fixed-window
fixed-window

Sliding Window

Divides a large window into smaller sub‑windows and sums their counters, providing smoother limiting at the cost of additional computation.

sliding-window
sliding-window

Leaky Bucket

Queues requests and processes them at a fixed rate, visualized as water leaking from a bucket; useful for smoothing bursty traffic.

leaky-bucket
leaky-bucket

Token Bucket

Generates tokens at a fixed rate up to a capacity; each request consumes a token, allowing bursts when tokens have accumulated.

token-bucket
token-bucket
public class TokenBucket {
    private final long capacity;
    private final double refillTokensPerOneMillis;
    private double availableTokens;
    private long lastRefillTimestamp;

    public TokenBucket(long capacity, long refillTokens, long refillPeriodMillis) {
        this.capacity = capacity;
        this.refillTokensPerOneMillis = (double) refillTokens / (double) refillPeriodMillis;
        this.availableTokens = capacity;
        this.lastRefillTimestamp = System.currentTimeMillis();
    }

    public synchronized boolean tryConsume(int numberTokens) {
        refill();
        if (availableTokens < numberTokens) {
            return false;
        } else {
            availableTokens -= numberTokens;
            return true;
        }
    }

    private void refill() {
        long currentTimeMillis = System.currentTimeMillis();
        if (currentTimeMillis > lastRefillTimestamp) {
            long millisSinceLastRefill = currentTimeMillis - lastRefillTimestamp;
            double refill = millisSinceLastRefill * refillTokensPerOneMillis;
            this.availableTokens = Math.min(capacity, availableTokens + refill);
            this.lastRefillTimestamp = currentTimeMillis;
        }
    }
}

Open‑Source Rate‑Limiter Projects

Guava RateLimiter

Implements a token‑bucket with smooth bursty and warm‑up modes.

RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire());

Bucket4j

Provides both in‑memory and distributed token‑bucket implementations using JCache‑compatible stores.

Bucket bucket = Bucket4j.builder().addLimit(limit).build();
if (bucket.tryConsume(1)) {
    System.out.println("ok");
} else {
    System.out.println("error");
}

Resilience4j

Offers a RateLimiter (token‑bucket) and Bulkhead (semaphore or thread‑pool) for concurrency limiting.

RateLimiterConfig cfg = RateLimiterConfig.custom()
    .limitForPeriod(1)
    .limitRefreshPeriod(Duration.ofSeconds(1))
    .timeoutDuration(Duration.ofMillis(100))
    .build();
RateLimiter limiter = RateLimiter.of("backend", cfg);

Implementing Rate Limiting in Spring Cloud Gateway

Local (single‑instance) request‑frequency limiting

Implement the RateLimiter interface and use a KeyResolver (e.g., IP‑based) to identify the limiting key.

public interface RateLimiter<C> extends StatefulConfigurable<C> {
    Mono<RateLimiter.Response> isAllowed(String routeId, String id);
}

Distributed request‑frequency limiting

Spring Cloud Gateway provides RedisRateLimiter backed by a Lua script that atomically updates tokens.

local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
-- token bucket logic omitted for brevity
return { allowed_num, new_tokens }

Local concurrency limiting

Use Resilience4j’s Bulkhead (semaphore) to restrict simultaneous executions.

BulkheadConfig bulkheadConfig = BulkheadConfig.custom()
    .maxConcurrentCalls(150)
    .maxWaitTime(100)
    .build();
Bulkhead bulkhead = Bulkhead.of("backend", bulkheadConfig);

Distributed concurrency limiting

Approaches include Redis‑based distributed semaphores (e.g., Redisson RSemaphore) or per‑instance counters with TTL, as well as a custom "dual‑window sliding" algorithm that keeps only the current and previous minute windows in Redis for efficient MGET checks.

Conclusion

Rate limiting is essential for gateway stability; understanding scenarios, algorithms, and available libraries enables developers to choose the right strategy—whether in‑memory, Redis‑backed, or hybrid—to meet both request‑frequency and concurrency requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaSpring Cloud Gatewayresilience4jBucket4j
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.