Backend Development 17 min read

Rate Limiting Algorithms: Fixed Window, Sliding Window, Leaky Bucket, and Token Bucket – Principles, Java Implementations, Pros & Cons

This article explains why rate limiting is essential for high‑concurrency systems, defines rate limiting, introduces four common algorithms (fixed‑window, sliding‑window, leaky‑bucket, token‑bucket), provides Java code examples for each, compares their advantages and disadvantages, and shows practical usage with Guava's RateLimiter and AOP annotations.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Rate Limiting Algorithms: Fixed Window, Sliding Window, Leaky Bucket, and Token Bucket – Principles, Java Implementations, Pros & Cons

Rate limiting controls the number of requests within a time window to keep services stable under sudden traffic spikes, malicious users, or fast message consumption that could overload databases.

In distributed high‑concurrency environments, limiting traffic is a primary protection mechanism; it ensures availability and prevents system crashes.

The article lists four typical rate‑limiting algorithms: Fixed Window, Sliding Window, Leaky Bucket, and Token Bucket.

1. Fixed Window

Implementation principle: count requests in a fixed period; when the count reaches a threshold, further requests are rejected until the window resets. Example threshold: no more than 150 requests in 3 seconds.

Java implementation:

public class FixedWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    long windowSize; // ms
    int maxRequestCount;
    AtomicInteger counter = new AtomicInteger(0);
    long windowBorder;
    public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
        this.windowSize = windowSize;
        this.maxRequestCount = maxRequestCount;
        this.windowBorder = System.currentTimeMillis() + windowSize;
    }
    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do { windowBorder += windowSize; } while (windowBorder < currentTime);
            counter = new AtomicInteger(0);
        }
        if (counter.intValue() < maxRequestCount) {
            counter.incrementAndGet();
            logger.info("tryAcquire success");
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Pros: simple, easy to understand. Cons: not smooth (burst requests at the start of a window are all accepted, later requests are blocked) and suffers from window‑boundary effects.

2. Sliding Window

Improves Fixed Window by dividing the window into smaller sub‑windows and sliding them forward, thus reducing boundary spikes. It requires storing timestamps or counters for each sub‑window.

Java implementation:

public class SlidingWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    long windowSize; // ms
    int shardNum; // number of sub‑windows
    int maxRequestCount;
    int[] shardRequestCount;
    int totalCount;
    int shardId;
    long tinyWindowSize;
    long windowBorder;
    public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
        this.windowSize = windowSize;
        this.shardNum = shardNum;
        this.maxRequestCount = maxRequestCount;
        this.shardRequestCount = new int[shardNum];
        this.tinyWindowSize = windowSize / shardNum;
        this.windowBorder = System.currentTimeMillis();
    }
    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do {
                shardId = (++shardId) % shardNum;
                totalCount -= shardRequestCount[shardId];
                shardRequestCount[shardId] = 0;
                windowBorder += tinyWindowSize;
            } while (windowBorder < currentTime);
        }
        if (totalCount < maxRequestCount) {
            logger.info("tryAcquire success:{}", shardId);
            shardRequestCount[shardId]++;
            totalCount++;
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Pros: eliminates boundary spikes, smoother handling of bursts. Cons: still not perfectly smooth and more complex to implement.

3. Leaky Bucket

Models a bucket that leaks at a constant rate; incoming requests fill the bucket, and excess requests are dropped when the bucket is full. It smooths traffic but cannot handle large bursts.

Java implementation:

public class LeakyBucketRateLimiter {
    Logger logger = LoggerFactory.getLogger(LeakyBucketRateLimiter.class);
    int capacity;
    AtomicInteger water = new AtomicInteger();
    long leakTimestamp;
    int leakRate; // permits per second
    public LeakyBucketRateLimiter(int capacity, int leakRate) {
        this.capacity = capacity;
        this.leakRate = leakRate;
    }
    public synchronized boolean tryAcquire() {
        if (water.get() == 0) {
            logger.info("start leaking");
            leakTimestamp = System.currentTimeMillis();
            water.incrementAndGet();
            return water.get() < capacity;
        }
        long currentTime = System.currentTimeMillis();
        int leakedWater = (int) ((currentTime - leakTimestamp) / 1000 * leakRate);
        logger.info("lastTime:{}, currentTime:{}, leakedWater:{}", leakTimestamp, currentTime, leakedWater);
        if (leakedWater != 0) {
            int leftWater = water.get() - leakedWater;
            water.set(Math.max(0, leftWater));
            leakTimestamp = System.currentTimeMillis();
        }
        logger.info("remaining capacity:{}", capacity - water.get());
        if (water.get() < capacity) {
            logger.info("tryAcquire success");
            water.incrementAndGet();
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

Pros: smooths traffic, prevents overload. Cons: cannot handle bursts, may discard requests, and can waste resources when the system is idle.

4. Token Bucket

Improves Leaky Bucket by adding tokens to a bucket at a fixed rate; a request consumes a token. If tokens are available, the request proceeds, allowing bursts up to the bucket capacity.

Guava’s RateLimiter is a ready‑made token‑bucket implementation.

Typical usage:

@Test
public void acquireTest() {
    // generate 5 tokens per second
    RateLimiter rateLimiter = RateLimiter.create(5);
    for (int i = 0; i < 10; i++) {
        double wait = rateLimiter.acquire();
        logger.info("wait time: {}s", wait);
    }
}

When a large request consumes many tokens, Guava allows it immediately but delays subsequent requests proportionally, smoothing burst traffic.

@Test
public void acquireSmoothly() {
    RateLimiter rateLimiter = RateLimiter.create(5, 3, TimeUnit.SECONDS);
    long start = System.currentTimeMillis();
    for (int i = 0; i < 15; i++) {
        double wait = rateLimiter.acquire();
        logger.info("wait:{}s, total:{}ms", wait, System.currentTimeMillis() - start);
    }
}

Pros: handles bursts, limits average rate, flexible (rate can be changed at runtime). Cons: if token generation is too fast, it may cause overload; requires memory for token storage; implementation is slightly more complex than simple counters.

Practical Integration

For application‑level limiting, the article shows how to define a custom @Limit annotation, implement an AOP aspect that creates and manages a RateLimiter per key, and apply the annotation on controller or service methods.

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD})
@Documented
public @interface Limit {
    String key() default "";
    double permitsPerSeconds();
    long timeout();
    TimeUnit timeUnit() default TimeUnit.MILLISECONDS;
    String msg() default "系统繁忙,请稍后重试";
}
@Aspect
@Component
public class LimitAspect {
    Logger logger = LoggerFactory.getLogger(LimitAspect.class);
    private final Map
limitMap = Maps.newConcurrentMap();
    @Around("@annotation(com.example.annotation.Limit)")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        Method method = ((MethodSignature) joinPoint.getSignature()).getMethod();
        Limit limit = method.getAnnotation(Limit.class);
        if (limit != null) {
            String key = limit.key();
            RateLimiter rl = limitMap.computeIfAbsent(key, k -> RateLimiter.create(limit.permitsPerSeconds()));
            if (!rl.tryAcquire(limit.timeout(), limit.timeUnit())) {
                logger.debug("Token bucket {}, acquire failed", key);
                throw new RuntimeException(limit.msg());
            }
        }
        return joinPoint.proceed();
    }
}

Applying the annotation:

@Limit(key = "query", permitsPerSeconds = 1, timeout = 1, msg = "触发接口限流,请重试")
public Result query(...) { ... }

The article concludes that application‑level rate limiting is useful for single‑instance services, but distributed systems require centralized or edge‑level limiting, often combined with circuit breaking and degradation strategies to ensure overall system resilience.

Backenddistributed systemsJavaAlgorithmAOPGuavaRate Limiting
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.