Cloud Native 41 min read

Rate Limiting in Spring Cloud Gateway: Algorithms, Implementations, and Practical Guide

This article provides a comprehensive overview of rate‑limiting techniques for Spring Cloud Gateway, covering common scenarios, classic algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and practical implementations using Redis, Resilience4j, Bucket4j, Guava and custom local limiters.

Top Architect
Top Architect
Top Architect
Rate Limiting in Spring Cloud Gateway: Algorithms, Implementations, and Practical Guide

Spring Cloud Gateway was introduced to replace Netflix Zuul as the default API gateway in the Spring Cloud ecosystem, offering reactive, non‑blocking APIs and tight integration with Spring Boot 2.0 and Project Reactor.

The gateway supports a built‑in Request Rate Limiting filter, but its default implementation (RedisRateLimiter) has limitations such as lack of sub‑second precision and no support for concurrent‑request limiting.

Common rate‑limiting scenarios include limiting request frequency per minute, per user download speed, concurrent requests per endpoint, or IP‑based blocking. Two primary limiting objects are request‑frequency and concurrent‑request limits.

Several classic algorithms are described:

Fixed Window – simple counter per time bucket, suffers from burst edge cases.

Sliding Window – divides a larger window into smaller sub‑windows for smoother limits.

Leaky Bucket – queues requests and drains them at a fixed rate.

Token Bucket – generates tokens at a fixed rate and allows bursts up to the bucket capacity.

Implementation examples:

Token bucket in Java:

public class TokenBucket {
    private final long capacity;
    private final double refillTokensPerOneMillis;
    private double availableTokens;
    private long lastRefillTimestamp;

    public TokenBucket(long capacity, long refillTokens, long refillPeriodMillis) {
        this.capacity = capacity;
        this.refillTokensPerOneMillis = (double) refillTokens / (double) refillPeriodMillis;
        this.availableTokens = capacity;
        this.lastRefillTimestamp = System.currentTimeMillis();
    }

    synchronized public boolean tryConsume(int numberTokens) {
        refill();
        if (availableTokens < numberTokens) {
            return false;
        } else {
            availableTokens -= numberTokens;
            return true;
        }
    }

    private void refill() {
        long currentTimeMillis = System.currentTimeMillis();
        if (currentTimeMillis > lastRefillTimestamp) {
            long millisSinceLastRefill = currentTimeMillis - lastRefillTimestamp;
            double refill = millisSinceLastRefill * refillTokensPerOneMillis;
            this.availableTokens = Math.min(capacity, availableTokens + refill);
            this.lastRefillTimestamp = currentTimeMillis;
        }
    }
}

Creating a token bucket instance:

TokenBucket limiter = new TokenBucket(100, 100, 1000);

Guava RateLimiter example (smooth bursty):

RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire()); // immediate
System.out.println(limiter.acquire()); // ~200 ms later

Bucket4j usage:

Bucket bucket = Bucket4j.builder()
    .addLimit(Bandwidth.simple(10, Duration.ofMinutes(1)))
    .build();
if (bucket.tryConsume(1)) {
    System.out.println("ok");
} else {
    System.out.println("error");
}

Resilience4j RateLimiter (token‑bucket based) example:

RateLimiterConfig config = RateLimiterConfig.custom()
    .limitForPeriod(5)
    .limitRefreshPeriod(Duration.ofSeconds(1))
    .timeoutDuration(Duration.ofMillis(100))
    .build();
RateLimiter limiter = RateLimiter.of("myLimiter", config);
boolean allowed = limiter.acquirePermission();

Spring Cloud Gateway’s RateLimiter interface:

public interface RateLimiter
extends StatefulConfigurable
{
    Mono
isAllowed(String routeId, String id);
}

LocalRateLimiter (based on Resilience4j) implementation:

public Mono
isAllowed(String routeId, String id) {
    Config routeConfig = loadConfiguration(routeId);
    int replenishRate = routeConfig.getReplenishRate();
    int refreshPeriod = routeConfig.getRefreshPeriod();
    int requestedTokens = routeConfig.getRequestedTokens();
    RateLimiter rateLimiter = RateLimiterRegistry.ofDefaults()
        .rateLimiter(id, createRateLimiterConfig(refreshPeriod, replenishRate));
    boolean allowed = rateLimiter.acquirePermission(requestedTokens);
    long tokensLeft = rateLimiter.getMetrics().getAvailablePermissions();
    return Mono.just(new Response(allowed, getHeaders(routeConfig, tokensLeft)));
}

Distributed rate limiting can be achieved with the built‑in RedisRateLimiter , which executes a Lua script ( request_rate_limiter.lua ) atomically in Redis to enforce token‑bucket semantics across multiple instances.

local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
-- token bucket logic ...
return { allowed_num, new_tokens }

For concurrent‑request limiting, the article discusses using Resilience4j Bulkhead (semaphore‑based) or thread‑pool based approaches, as well as custom distributed semaphore implementations via Redis, Redisson, or Ignite.

Finally, the article outlines challenges such as handling crashes that may leave semaphores unreleased, and proposes two solutions: TTL‑based per‑request keys in Redis, and a “dual‑window sliding” algorithm that keeps only the current and previous minute counters, migrating expired counts periodically to maintain accuracy while limiting Redis load.

MicroservicesRedisrate limitingtoken bucketSpring Cloud GatewayResilience4j
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.