Rate Limiting in Spring Cloud Gateway: Scenarios, Algorithms, Open‑Source Tools, and Practical Implementations

This article comprehensively explains rate‑limiting concepts for Spring Cloud Gateway, covering common throttling scenarios, major algorithms such as Fixed Window, Sliding Window, Leaky Bucket and Token Bucket, reviews popular open‑source libraries, and demonstrates both single‑node and distributed implementations with detailed code examples.

Top Architect
Top Architect
Top Architect
Rate Limiting in Spring Cloud Gateway: Scenarios, Algorithms, Open‑Source Tools, and Practical Implementations

1. Introduction

The author, a senior architect, shares a systematic study of request‑rate limiting for Spring Cloud Gateway, describing why throttling is essential for high‑traffic systems and how it fits into micro‑service architectures.

2. Common Limiting Scenarios

Interface call count limits (e.g., 100 requests per minute)

Download speed caps (e.g., 100 KB/s per user)

Concurrent request caps per user or IP

IP‑wide black‑listing

These scenarios translate into two main objects: request‑frequency limiting and concurrent‑request limiting.

3. Typical Limiting Algorithms

3.1 Fixed Window

Counts requests in a discrete time bucket; simple but suffers from boundary spikes.

3.2 Sliding Window

Divides a larger window into smaller sub‑windows and aggregates their counters for smoother control.

3.3 Leaky Bucket

Queues incoming requests and drains them at a fixed rate, smoothing burst traffic.

3.4 Token Bucket

Generates tokens at a steady rate; each request consumes a token, allowing bursts when the bucket is full.

public class TokenBucket {
    private final long capacity;
    private final double refillTokensPerOneMillis;
    private double availableTokens;
    private long lastRefillTimestamp;
    // constructor and methods omitted for brevity
}

4. Open‑Source Rate‑Limiting Projects

Guava RateLimiter (token‑bucket based, smooth burst & warm‑up)

Bucket4j (token‑bucket, supports distributed caches)

Resilience4j (rate‑limiter & bulkhead for concurrency control)

Each library’s usage pattern is illustrated with concise code snippets.

// Guava RateLimiter example
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire());
// Bucket4j bucket creation
Bucket bucket = Bucket4j.builder()
    .addLimit(Bandwidth.simple(10, Duration.ofMinutes(1)))
    .build();
// Resilience4j bulkhead & rate‑limiter composition
Bulkhead bulkhead = Bulkhead.of("backend", BulkheadConfig.custom().maxConcurrentCalls(150).build());
RateLimiter rateLimiter = RateLimiter.of("backend", RateLimiterConfig.custom().limitForPeriod(1).limitRefreshPeriod(Duration.ofSeconds(1)).build());

5. Implementing Limiting in Spring Cloud Gateway

5.1 Single‑Node Request‑Frequency Limiting

Gateway defines a RateLimiter interface; a local implementation can use Resilience4j or Bucket4j.

public interface RateLimiter<C> extends StatefulConfigurable<C> {
    Mono<RateLimiter.Response> isAllowed(String routeId, String id);
}

5.2 Distributed Request‑Frequency Limiting

Spring Cloud Gateway already provides RedisRateLimiter, which executes a Lua script atomically in Redis.

local tokens_key = KEYS[1]
local timestamp_key = KEYS[2]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
-- script body omitted for brevity
return { allowed_num, new_tokens }

5.3 Single‑Node Concurrent Limiting

Uses Resilience4j bulkhead or Java Semaphore to cap simultaneous executions.

Semaphore semaphore = new Semaphore(10);
semaphore.acquire();
// process request
semaphore.release();

5.4 Distributed Concurrent Limiting

Approaches include TTL‑based counters in Redis, per‑instance keys, or a custom “dual‑window sliding” algorithm that keeps only the current and previous minute windows in Redis for atomic MGET checks.

// Dual‑window sliding algorithm sketch (pseudo‑code)
String curKey = "gw:cnt:" + currentMinute();
String prevKey = "gw:cnt:" + previousMinute();
Long cur = redis.get(curKey);
Long prev = redis.get(prevKey);
long total = (cur == null ? 0 : cur) + (prev == null ? 0 : prev);
if (total > limit) reject(); else allow();

6. Summary

Rate limiting is a cornerstone of gateway stability; the article walks through scenarios, classic algorithms, useful libraries, and concrete implementations for both single‑node and distributed environments, while also highlighting pitfalls such as token‑bucket granularity limits and exception‑safe semaphore release.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendAlgorithmsrate limitingSpring Cloud Gateway
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.