Mastering Rate Limiting: Algorithms, Java Implementations, and Guava Tips

This article explains why rate limiting is essential for high‑traffic systems, defines common rate‑limiting algorithms (fixed window, sliding window, leaky bucket, token bucket), provides complete Java code examples for each, and demonstrates practical usage with Guava's RateLimiter in real‑world applications.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Mastering Rate Limiting: Algorithms, Java Implementations, and Guava Tips

1. Rate Limiting

Rate limiting restricts the number of requests processed within a given time window to protect system stability and prevent overload caused by traffic spikes or malicious users.

Why Apply Rate Limiting?

Instant traffic spikes can crash services.

Malicious high‑frequency requests may cause server failure.

Fast message consumption can overload databases.

What Is Rate Limiting?

It caps the request count in a specific time window, ensuring availability and preventing slowdowns or crashes in high‑concurrency scenarios.

2. Rate‑Limiting Algorithms

2.1 Fixed Window

The Fixed Window (or Counter) algorithm counts requests in a fixed period and blocks further requests once a threshold is reached. When the window expires, the counter resets.

public class FixedWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    // window size in ms
    long windowSize;
    // max allowed requests per window
    int maxRequestCount;
    // current count
    AtomicInteger counter = new AtomicInteger(0);
    // window end timestamp
    long windowBorder;

    public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
        this.windowSize = windowSize;
        this.maxRequestCount = maxRequestCount;
        this.windowBorder = System.currentTimeMillis() + windowSize;
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do {
                windowBorder += windowSize;
            } while (windowBorder < currentTime);
            counter = new AtomicInteger(0);
        }
        if (counter.intValue() < maxRequestCount) {
            counter.incrementAndGet();
            logger.info("tryAcquire success");
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Advantages: Simple to implement and easy to understand.

Disadvantages: Not smooth; requests arriving early in the window can cause all later requests to be rejected, and window boundary effects may allow a burst that exceeds the intended limit.

2.2 Sliding Window

The Sliding Window refines the Fixed Window by dividing the window into smaller sub‑windows, moving forward as time progresses. This smooths traffic and mitigates boundary spikes, though it requires storing timestamps for each request.

public class SlidingWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    // total window size in ms
    long windowSize;
    // number of sub‑windows
    int shardNum;
    // max allowed requests per total window
    int maxRequestCount;
    // request count per sub‑window
    int[] shardRequestCount;
    // total requests in current window
    int totalCount;
    // current sub‑window index
    int shardId;
    // size of each sub‑window in ms
    long tinyWindowSize;
    // window end timestamp
    long windowBorder;

    public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
        this.windowSize = windowSize;
        this.shardNum = shardNum;
        this.maxRequestCount = maxRequestCount;
        this.shardRequestCount = new int[shardNum];
        this.tinyWindowSize = windowSize / shardNum;
        this.windowBorder = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do {
                shardId = (++shardId) % shardNum;
                totalCount -= shardRequestCount[shardId];
                shardRequestCount[shardId] = 0;
                windowBorder += tinyWindowSize;
            } while (windowBorder < currentTime);
        }
        if (totalCount < maxRequestCount) {
            logger.info("tryAcquire success:{}", shardId);
            shardRequestCount[shardId]++;
            totalCount++;
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Advantages: Solves the boundary problem of Fixed Window and handles burst traffic.

Disadvantages: Still not perfectly smooth; a burst can still cause temporary overload.

2.3 Leaky Bucket

The Leaky Bucket shapes traffic by allowing requests to “leak” at a constant rate. Excess requests overflow and are dropped, providing smooth output but potentially discarding data under heavy load.

public class LeakyBucketRateLimiter {
    Logger logger = LoggerFactory.getLogger(LeakyBucketRateLimiter.class);
    // bucket capacity
    int capacity;
    // current water level
    AtomicInteger water = new AtomicInteger();
    // timestamp of last leak
    long leakTimestamp;
    // leak rate (requests per second)
    int leakRate;

    public LeakyBucketRateLimiter(int capacity, int leakRate) {
        this.capacity = capacity;
        this.leakRate = leakRate;
    }

    public synchronized boolean tryAcquire() {
        if (water.get() == 0) {
            logger.info("start leaking");
            leakTimestamp = System.currentTimeMillis();
            water.incrementAndGet();
            return water.get() < capacity;
        }
        long currentTime = System.currentTimeMillis();
        int leakedWater = (int) ((currentTime - leakTimestamp) / 1000 * leakRate);
        logger.info("lastTime:{}, currentTime:{}, leakedWater:{}", leakTimestamp, currentTime, leakedWater);
        if (leakedWater != 0) {
            int leftWater = water.get() - leakedWater;
            water.set(Math.max(0, leftWater));
            leakTimestamp = System.currentTimeMillis();
        }
        logger.info("remaining capacity:{}", capacity - water.get());
        if (water.get() < capacity) {
            logger.info("tryAcquire success");
            water.incrementAndGet();
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Advantages: Smooths traffic and prevents overload by discarding excess requests.

Disadvantages: Cannot handle sudden bursts efficiently and may waste resources when the system is idle.

2.4 Token Bucket

The Token Bucket improves on the Leaky Bucket by allowing tokens to accumulate at a fixed rate; requests consume tokens, enabling occasional bursts while still enforcing an average rate.

// Guava RateLimiter is a ready‑made token‑bucket implementation
RateLimiter limiter = RateLimiter.create(5); // 5 permits per second
for (int i = 0; i < 10; i++) {
    double wait = limiter.acquire();
    logger.info("wait time: {}s", wait);
}

3. Practical Application with Guava

Guava's RateLimiter provides a token‑bucket based rate limiter that can be directly used in Java projects. Add the dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.3-jre</version>
</dependency>

Example of fixed token generation:

@Test
public void acquireTest() {
    // generate 5 tokens per second
    RateLimiter rateLimiter = RateLimiter.create(5);
    for (int i = 0; i < 10; i++) {
        double time = rateLimiter.acquire();
        logger.info("wait time: {}s", time);
    }
}

Example of handling bursts with warm‑up period:

@Test
public void acquireSmoothly() {
    RateLimiter rateLimiter = RateLimiter.create(5, 3, TimeUnit.SECONDS);
    long start = System.currentTimeMillis();
    for (int i = 0; i < 15; i++) {
        double time = rateLimiter.acquire();
        logger.info("wait: {}s, total: {}ms", time, System.currentTimeMillis() - start);
    }
}

4. AOP‑Based Rate Limiting

Define a custom annotation @Limit to specify key, permits per second, timeout, and message. Implement an Aspect that creates a Guava RateLimiter per key and applies tryAcquire before method execution.

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
@Documented
public @interface Limit {
    String key() default "";
    double permitsPerSeconds();
    long timeout();
    TimeUnit timeUnit() default TimeUnit.MILLISECONDS;
    String msg() default "System busy, please try again later";
}
@Aspect
@Component
public class LimitAspect {
    Logger logger = LoggerFactory.getLogger(LimitAspect.class);
    private final Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();

    @Around("@annotation(com.example.annotation.Limit)")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        Method method = ((MethodSignature) joinPoint.getSignature()).getMethod();
        Limit limit = method.getAnnotation(Limit.class);
        if (limit != null) {
            String key = limit.key();
            RateLimiter rateLimiter = limitMap.computeIfAbsent(key, k -> {
                RateLimiter rl = RateLimiter.create(limit.permitsPerSeconds());
                logger.info("Created token bucket={}, capacity={}", k, limit.permitsPerSeconds());
                return rl;
            });
            boolean acquire = rateLimiter.tryAcquire(limit.timeout(), limit.timeUnit());
            if (!acquire) {
                logger.debug("Token bucket={}, acquire failed", key);
                throw new RuntimeException(limit.msg());
            }
        }
        return joinPoint.proceed();
    }
}

Apply the annotation on a controller or service method:

@Limit(key = "query", permitsPerSeconds = 1, timeout = 1, msg = "Rate limit triggered, please retry")
public Result query(...) { ... }

5. Summary

The discussed techniques are application‑level rate limiting suitable for a single instance. For multi‑node deployments, distributed or edge‑level rate limiting is required. Combining rate limiting with circuit breaking and degradation strategies ensures robust, high‑availability services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsJavaalgorithmGuavarate limiting
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.