High‑Availability Rate‑Limiting Solutions for Spring Boot Systems

The article explains why rate limiting is essential for high‑traffic Spring Boot services, compares counter, sliding‑window, token‑bucket and leaky‑bucket algorithms, demonstrates implementations with plain Java, Guava RateLimiter, AOP, and distributed approaches using Redis‑Lua and Nginx‑Lua, and provides practical configuration examples.

Shepherd Advanced Notes
Shepherd Advanced Notes
Shepherd Advanced Notes
High‑Availability Rate‑Limiting Solutions for Spring Boot Systems

1. What Is Rate Limiting?

Rate limiting restricts the number of requests that can be processed within a given time window to keep a system stable and prevent slowdowns or crashes during traffic spikes. Real‑world analogies include visitor caps at scenic spots or queueing at popular restaurants. In the digital world, sudden surges—such as a celebrity announcement that drives traffic from 500 k to 5 M requests—require limiting to keep the system usable.

2. Rate‑Limiting Algorithms

Counter algorithm – the simplest method that limits total concurrent requests (e.g., database connections, thread‑pool size). It typically caps the number of requests per second. The implementation increments a counter for each request and resets it when the time window expires.

/**
 * Fixed‑window rate‑limiting algorithm
 */
public class Counter {
    public long timeStamp = System.currentTimeMillis(); // current time
    public int reqCount = 0; // counter
    public final int limit = 1000; // max requests per window
    public final long interval = 1000 * 60; // window size in ms

    public boolean limit() {
        long now = System.currentTimeMillis();
        if (now < timeStamp + interval) {
            // inside window
            reqCount++;
            return reqCount <= limit;
        } else {
            timeStamp = now;
            // reset after timeout
            reqCount = 1;
            return true;
        }
    }
}

The fixed‑window approach suffers from a boundary‑condition problem : if the limit is 5 requests per second, 5 requests arriving at 0.8‑1 s and another 5 at 1‑1.2 s each stay within their own windows, but the combined 0.8‑1.2 s interval sees 10 requests, exceeding the intended limit.

Sliding‑window algorithm solves this by dividing the window into smaller sub‑periods (e.g., five 0.2 s slots) and maintaining a counter for each slot. As time slides, old slots are discarded, providing a smoother and more accurate count.

When the limit is 5 requests per second, the sliding window will reject the second batch of 5 requests because the current 1‑second window (e.g., 0.2‑1.2 s) already contains 5 earlier requests.

TIPS : The more sub‑periods you use, the smoother the sliding window and the more precise the limiting.

private int SUB_CYCLE = 10; // each sub‑cycle is 10 s when the total window is 1 min
private int thresholdPerMin = 100; // max requests per minute
private final TreeMap<Long, Integer> counters = new TreeMap<>;

boolean slidingWindowsTryAcquire() {
    long currentWindowTime = LocalDateTime.now().toEpochSecond(ZoneOffset.UTC) / SUB_CYCLE * SUB_CYCLE;
    int currentWindowNum = countCurrentWindow(currentWindowTime);
    if (currentWindowNum >= thresholdPerMin) {
        return false; // exceed limit
    }
    counters.get(currentWindowTime)++;
    return true;
}

private int countCurrentWindow(long currentWindowTime) {
    long startTime = currentWindowTime - SUB_CYCLE * (60 / SUB_CYCLE - 1);
    int count = 0;
    Iterator<Map.Entry<Long, Integer>> iterator = counters.entrySet().iterator();
    while (iterator.hasNext()) {
        Map.Entry<Long, Integer> entry = iterator.next();
        if (entry.getKey() < startTime) {
            iterator.remove();
        } else {
            count += entry.getValue();
        }
    }
    return count;
}

Sliding windows still reject all traffic once the limit is reached, which can be harsh. Alternative algorithms address this.

Leaky‑bucket algorithm models incoming requests as water poured into a bucket that leaks at a constant rate. When the bucket overflows, excess requests are dropped.

Token‑bucket algorithm maintains a bucket of tokens that are added at a fixed rate. A request consumes a token; if none are available, the request is rejected or queued. This allows bursts while enforcing an average rate.

Key differences:

Token bucket adds tokens at a steady rate and permits bursts; leaky bucket drains at a constant rate and smooths traffic.

Token bucket limits average inflow, allowing occasional spikes; leaky bucket limits outflow, preventing spikes.

Token bucket is suitable when burst tolerance is needed; leaky bucket is used for strict smoothing.

3. Guava RateLimiter (Token‑Bucket)

Google Guava provides RateLimiter, an easy‑to‑use token‑bucket implementation. It supports smooth burst and warm‑up modes.

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>

Example controller:

@Slf4j
@RestController
@RequestMapping("/test")
public class TestController {
    // limit to 2 requests per second
    private final RateLimiter limiter = RateLimiter.create(2.0);
    private DateTimeFormatter dtf = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");

    @GetMapping("/rateLimit")
    public String testLimiter() {
        // try to acquire a token within 50 ms
        boolean tryAcquire = limiter.tryAcquire(50, TimeUnit.MILLISECONDS);
        if (!tryAcquire) {
            log.warn("Service degraded at {}", LocalDateTime.now().format(dtf));
            return "Current queue is long, please try later!";
        }
        log.info("Token acquired at {}", LocalDateTime.now().format(dtf));
        return "Request succeeded";
    }
}

Core methods: create() – builds a RateLimiter with a given permits‑per‑second rate. tryAcquire() – returns true if a token can be obtained immediately; otherwise false.

Overloads allow specifying the number of permits and a timeout.

Guava’s RateLimiter works only for single‑node deployments; it cannot coordinate across multiple instances.

4. Elegant Integration with AOP

To avoid repetitive token checks, define a custom annotation and an AOP interceptor.

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD, ElementType.TYPE})
@Documented
public @interface RateLimit {
    /** Unique key for the resource */
    String key() default "";
    /** Maximum permits per second */
    double permitsPerSecond();
    /** Maximum wait time for a token */
    long timeout();
    /** Time unit for timeout, default ms */
    TimeUnit timeunit() default TimeUnit.MILLISECONDS;
    /** Message when token acquisition fails */
    String msg() default "System busy, please try later.";
}

The interceptor caches a RateLimiter per key and applies the same logic as the controller example.

@Slf4j
@Component
@Aspect
public class RateLimitAop {
    private final Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();

    @Around("@annotation(com.shepherd.mall.seckill.annotation.RateLimit)")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        Method method = signature.getMethod();
        RateLimit limit = method.getAnnotation(RateLimit.class);
        if (limit != null) {
            String key = limit.key();
            RateLimiter rateLimiter = limitMap.computeIfAbsent(key, k -> {
                RateLimiter rl = RateLimiter.create(limit.permitsPerSecond());
                log.info("Created token bucket {} with rate {}", k, limit.permitsPerSecond());
                return rl;
            });
            boolean acquire = rateLimiter.tryAcquire(limit.timeout(), limit.timeunit());
            if (!acquire) {
                log.debug("Token bucket {} acquisition failed", key);
                responseFail(limit.msg());
                return null;
            }
        }
        return joinPoint.proceed();
    }

    private void responseFail(String msg) {
        HttpServletResponse response = ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes()).getResponse();
        ResponseVO<Object> responseVO = ResponseVO.failure(400, msg);
        WebUtil.writeJson(response, responseVO);
    }
}

Applying the annotation to a controller method automatically enforces the configured rate limit.

@GetMapping("/limit2")
@RateLimit(key = "limit2", permitsPerSecond = 1, timeout = 50, timeunit = TimeUnit.MILLISECONDS,
           msg = "Current queue is long, please try later!")
public String limit2() {
    log.info("Token bucket limit2 acquired token");
    return "ok";
}

5. Distributed Rate Limiting

For multi‑node deployments, the limiter must be atomic. Lua scripts executed in Redis guarantee atomicity.

local key = "rate.limit:" .. KEYS[1]
local limit = tonumber(ARGV[1])
local expire_time = ARGV[2]
local is_exists = redis.call("EXISTS", key)
if is_exists == 1 then
    if redis.call("INCR", key) > limit then
        return 0
    else
        return 1
    end
else
    redis.call("SET", key, 1)
    redis.call("EXPIRE", key, expire_time)
    return 1
end

Another approach uses Nginx with Lua to perform per‑IP counting.

local locks = require "resty.lock"
function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key")
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1)
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

Nginx also provides built‑in modules: ngx_http_limit_conn_module – limits concurrent connections per key (e.g., IP). ngx_http_limit_req_module – limits request rate with optional burst and nodelay parameters.

http {
    limit_req_zone $binary_remote_addr zone=contentRateLimit:10m rate=2r/s;
    limit_conn_zone $binary_remote_addr zone=addr:1m;
    server {
        listen 80;
        location /brand {
            limit_conn addr 2;
            proxy_pass http://192.168.211.1:18081;
        }
        location /read_content {
            limit_req zone=contentRateLimit burst=4 nodelay;
            content_by_lua_file /root/lua/read_content.lua;
        }
    }
}

The burst parameter defines how many excess requests can be queued; combined with nodelay it processes queued requests immediately, otherwise they wait for the configured interval.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RedisSpring BootGuavaNginxrate limitingSliding WindowToken Bucket
Shepherd Advanced Notes
Written by

Shepherd Advanced Notes

Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.