Mastering Rate Limiting: Token Bucket & Sliding Window Algorithms in Java

This article explains the principles and implementation details of common rate‑limiting algorithms—token bucket and sliding‑window counting—including their core concepts, key processes, Java code examples, and how to extend them to distributed scenarios with Redis Lua scripts.

Java Baker
Java Baker
Java Baker
Mastering Rate Limiting: Token Bucket & Sliding Window Algorithms in Java

Introduction

Rate limiting is a fundamental technique for protecting services from overload and ensuring fair resource usage. The article covers two widely used algorithms—token bucket and sliding‑window counting—detailing their concepts, characteristics, key processing steps, and practical Java implementations. It also shows how to adapt these algorithms for distributed environments using Redis Lua scripts.

Basic Interface

tryAcquire : non‑blocking attempt that returns true if a permit is granted, false otherwise.

acquire : blocking call that waits (optionally with a timeout) until a permit becomes available.

Token Bucket Algorithm

Concept

Token: a permit representing a unit of allowed QPS or resource consumption.

Bucket: holds a maximum number of tokens, defining the capacity.

Refill rate: how many tokens are added per second.

Available tokens never exceed the bucket size.

Characteristics

Works for both bursty and steady traffic.

Implementation is relatively simple compared to other algorithms.

Key Process

The bucket is refilled lazily: each call recomputes the number of tokens that should have been added since the last refill.

tryAcquire :

Recalculate available tokens.

If enough tokens exist, deduct the requested amount and return success.

Otherwise return failure.

acquire :

If permits ≤ available tokens, deduct and return.

Otherwise compute wait time as (permits - tokens) * refillIntervalNanos, then block using Condition.awaitNanos until tokens are replenished.

Critical Refill Logic

Tokens to add = (now - lastRefillTime) / refillIntervalNanos.

Update tokens = min(capacity, tokens + tokensToAdd) and set lastRefillTime = now.

If tokens become available, wake waiting threads with condition.signalAll().

Java Implementation

import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.ReentrantLock;

/**
 * Token‑bucket rate limiter – supports non‑blocking and blocking token acquisition.
 */
public class TokenBucketRateLimiter {
    private final long capacity;               // bucket size
    private final double refillRate;          // tokens per second
    private final long refillIntervalNanos;   // nanoseconds per token

    private double tokens;                    // current available tokens
    private long lastRefillTime;               // last refill timestamp

    private final ReentrantLock lock;
    private final Condition condition;

    public TokenBucketRateLimiter(long capacity, double refillRate) {
        if (capacity <= 0 || refillRate <= 0) {
            throw new IllegalArgumentException("capacity and refillRate must be > 0");
        }
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.refillIntervalNanos = (long) (1_000_000_000.0 / refillRate);
        this.tokens = capacity;
        this.lastRefillTime = System.nanoTime();
        this.lock = new ReentrantLock();
        this.condition = lock.newCondition();
    }

    public boolean tryAcquire() {
        return tryAcquire(1);
    }

    public boolean tryAcquire(long permits) {
        if (permits <= 0) {
            throw new IllegalArgumentException("permits must be > 0");
        }
        if (permits > capacity) {
            return false; // exceeds bucket capacity
        }
        lock.lock();
        try {
            refillTokens();
            if (tokens >= permits) {
                tokens -= permits;
                return true;
            }
            return false;
        } finally {
            lock.unlock();
        }
    }

    public void acquire() throws InterruptedException {
        acquire(1);
    }

    public void acquire(long permits) throws InterruptedException {
        if (permits <= 0) {
            throw new IllegalArgumentException("permits must be > 0");
        }
        if (permits > capacity) {
            throw new IllegalArgumentException("permits exceed bucket capacity");
        }
        lock.lock();
        try {
            while (tokens < permits) {
                double needed = permits - tokens;
                long waitNanos = (long) (needed * refillIntervalNanos);
                condition.awaitNanos(waitNanos);
                refillTokens();
            }
            tokens -= permits;
        } finally {
            lock.unlock();
        }
    }

    private void refillTokens() {
        long now = System.nanoTime();
        double toAdd = (now - lastRefillTime) / (double) refillIntervalNanos;
        if (toAdd > 0) {
            tokens = Math.min(capacity, tokens + toAdd);
            lastRefillTime = now;
            if (tokens > 0) {
                condition.signalAll();
            }
        }
    }
}

Sliding Window Counting Algorithm

Concept

Time window: a configurable duration (seconds, minutes, etc.) within which a maximum number of requests is allowed.

Slots (or cells): the window is divided into finer‑grained intervals (e.g., 10 slots of 100 ms each).

Each slot records the request count for its sub‑interval.

The total count is the sum of all slot counts.

Characteristics

Provides smoother throttling by preventing traffic spikes.

Implementation is more complex because slot counts must be updated and expired.

Key Process (Sliding Window)

tryAcquire :

Clear expired slots (reset count and update start time).

Check if totalRequests + permits ≤ maxRequests.

If allowed, increment the current slot and total count.

acquire (blocking):

Repeatedly attempt tryAcquireInternal.

If not enough tokens, compute wait time until the earliest non‑empty slot expires.

Block with Condition.await (or timed await) and retry.

Java Implementation

import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.ReentrantLock;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Sliding‑window rate limiter – supports non‑blocking and blocking acquisition.
 */
public class SlidingWindowRateLimiter {
    private final long windowSizeMs;   // total window length
    private final int maxRequests;    // allowed requests per window
    private final int slotCount;      // number of slots
    private final long slotSizeMs;     // slot duration

    private final AtomicInteger[] slots;      // per‑slot counters
    private final long[] slotStartTimes;     // slot start timestamps
    private final AtomicInteger totalRequests; // sum of all slots

    private final ReentrantLock lock;
    private final Condition condition;

    public SlidingWindowRateLimiter(int maxRequests, long windowSizeMs, int slotCount) {
        if (maxRequests <= 0 || windowSizeMs <= 0 || slotCount <= 0) {
            throw new IllegalArgumentException("arguments must be > 0");
        }
        this.maxRequests = maxRequests;
        this.windowSizeMs = windowSizeMs;
        this.slotCount = slotCount;
        this.slotSizeMs = windowSizeMs / slotCount;

        this.slots = new AtomicInteger[slotCount];
        this.slotStartTimes = new long[slotCount];
        this.totalRequests = new AtomicInteger(0);

        long now = System.currentTimeMillis();
        for (int i = 0; i < slotCount; i++) {
            slots[i] = new AtomicInteger(0);
            slotStartTimes[i] = now - (slotCount - i) * slotSizeMs;
        }
        this.lock = new ReentrantLock();
        this.condition = lock.newCondition();
    }

    public boolean tryAcquire() {
        return tryAcquire(1);
    }

    public boolean tryAcquire(int permits) {
        if (permits <= 0) {
            throw new IllegalArgumentException("permits must be > 0");
        }
        if (permits > maxRequests) {
            return false;
        }
        lock.lock();
        try {
            slideWindow();
            if (totalRequests.get() + permits <= maxRequests) {
                long now = System.currentTimeMillis();
                int idx = getCurrentSlotIndex(now);
                slots[idx].addAndGet(permits);
                totalRequests.addAndGet(permits);
                return true;
            }
            return false;
        } finally {
            lock.unlock();
        }
    }

    public void acquire() throws InterruptedException {
        acquire(1);
    }

    public void acquire(int permits) throws InterruptedException {
        if (permits <= 0) {
            throw new IllegalArgumentException("permits must be > 0");
        }
        if (permits > maxRequests) {
            throw new IllegalArgumentException("permits exceed window capacity");
        }
        lock.lock();
        try {
            while (!tryAcquireInternal(permits)) {
                long waitMs = calculateWaitTime(permits);
                if (waitMs > 0) {
                    condition.await(waitMs, TimeUnit.MILLISECONDS);
                } else {
                    condition.await();
                }
            }
        } finally {
            lock.unlock();
        }
    }

    private boolean tryAcquireInternal(int permits) {
        slideWindow();
        if (totalRequests.get() + permits <= maxRequests) {
            long now = System.currentTimeMillis();
            int idx = getCurrentSlotIndex(now);
            slots[idx].addAndGet(permits);
            totalRequests.addAndGet(permits);
            return true;
        }
        return false;
    }

    private void slideWindow() {
        long now = System.currentTimeMillis();
        long windowStart = now - windowSizeMs;
        for (int i = 0; i < slotCount; i++) {
            if (slotStartTimes[i] < windowStart) {
                int expired = slots[i].getAndSet(0);
                if (expired > 0) {
                    totalRequests.addAndGet(-expired);
                    slotStartTimes[i] = now - (slotCount - i - 1) * slotSizeMs;
                }
            }
        }
        if (totalRequests.get() < maxRequests) {
            condition.signalAll();
        }
    }

    private int getCurrentSlotIndex(long currentTime) {
        return (int) ((currentTime / slotSizeMs) % slotCount);
    }

    private long calculateWaitTime(int permits) {
        long now = System.currentTimeMillis();
        long earliest = Long.MAX_VALUE;
        for (int i = 0; i < slotCount; i++) {
            if (slots[i].get() > 0 && slotStartTimes[i] < earliest) {
                earliest = slotStartTimes[i];
            }
        }
        if (earliest == Long.MAX_VALUE) {
            return 0;
        }
        long earliestExpire = earliest + windowSizeMs;
        long wait = earliestExpire - now;
        int available = maxRequests - totalRequests.get();
        if (available <= 0) {
            wait = Math.max(wait, slotSizeMs);
        }
        return Math.max(0, wait);
    }

    public int getCurrentRequests() {
        lock.lock();
        try {
            slideWindow();
            return totalRequests.get();
        } finally {
            lock.unlock();
        }
    }
}

Distributed Rate Limiting with Redis

Both algorithms can be adapted for a distributed environment by storing counters in Redis and using Lua scripts to guarantee atomicity.

Fixed‑Window Counter via Lua

-- Fixed‑window rate‑limiting script
-- KEYS[1]: limit key
-- ARGV[1]: window size (seconds)
-- ARGV[2]: max allowed requests

local key = KEYS[1]
local window = tonumber(ARGV[1])
local maxCount = tonumber(ARGV[2])

local current = redis.call('GET', key)
if current == false then
    -- key does not exist, create with initial count 1 and set TTL
    redis.call('SET', key, 1, 'EX', window)
    return 1
else
    local count = tonumber(current)
    if count < maxCount then
        redis.call('INCR', key)
        return 1
    else
        return 0
    end
end

Conclusion

Understanding the mechanics of token‑bucket and sliding‑window rate‑limiting algorithms enables developers to choose the most suitable strategy for their services. Token bucket excels at handling bursts and is widely adopted (e.g., Guava RateLimiter, Sentinel, Redisson). Sliding‑window provides stricter smoothing, while distributed implementations using Redis Lua scripts extend these concepts across multiple nodes.

RedisDistributedRate LimitingSliding WindowToken Bucket
Java Baker
Written by

Java Baker

Java architect and Raspberry Pi enthusiast, dedicated to writing high-quality technical articles; the same name is used across major platforms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.