Backend Development 14 min read

Master Rate Limiting: Fixed, Sliding, Leaky & Token Bucket in Java

This article explains common rate‑limiting algorithms—including fixed‑window counters, sliding windows, leaky buckets, and token buckets—detailing their principles, advantages, and pitfalls, and provides complete Java code examples to illustrate implementation and testing of each method.

Java High-Performance Architecture

Aug 28, 2023

Master Rate Limiting: Fixed, Sliding, Leaky & Token Bucket in Java

Rate Limiting Implementation

Common Rate Limiting Algorithms

Rate limiting restricts the number of requests within a time window to maintain system availability and stability, preventing slowdowns or crashes caused by traffic spikes.

Counter Rate Limiting (Fixed Window)

Principle:

Time is divided into multiple independent fixed-size windows.

Requests falling into a window increment a counter.

If the counter exceeds the threshold, subsequent requests in that window are rejected; the counter resets when the next window starts.

Example:

package com.example.studyproject.algorithm;

import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * @ClassName: FixedWindow
 * @Description: Fixed window algorithm
 */
public class FixedWindow {
    private static Integer QPS = 2;
    private static long TIME_WINDOWS = 1000;
    private static AtomicInteger REQ_COUNT = new AtomicInteger();
    private static long START_TIME = System.currentTimeMillis();

    public synchronized static boolean tryAcquire() {
        if ((System.currentTimeMillis() - START_TIME) > TIME_WINDOWS) {
            REQ_COUNT.set(0);
            START_TIME = System.currentTimeMillis();
        }
        return REQ_COUNT.incrementAndGet() <= QPS;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 10; i++) {
            Thread.sleep(250);
            LocalTime now = LocalTime.now();
            if (!tryAcquire()) {
                System.out.println(now + " 被限流");
            } else {
                System.out.println(now + " 做点什么");
            }
        }
    }
}

Problem: When a time window’s boundary splits requests (e.g., the last 500 ms of one second and the first 500 ms of the next), the algorithm may allow up to four requests despite a QPS limit of 2.

Sliding Window

Sliding window algorithm improves upon the fixed window approach.

Time is divided into many small intervals. Each interval has its own counter. As time progresses, the window slides, discarding the oldest interval and adding a new one. The total count across all intervals determines whether requests are allowed.

Shorter slide intervals reduce the likelihood of boundary‑spike issues, though they cannot eliminate them entirely.

Code Example:

package com.example.studyproject.algorithm;

import lombok.Data;
import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * @ClassName: SlidingWindow
 * @Description: Sliding window algorithm
 */
public class SlidingWindow {
    private int qps = 2;
    private long windowSize = 1000;
    private Integer windowCount = 10;
    private WindowInfo[] windowArray = new WindowInfo[windowCount];

    public SlidingWindow(int qps) {
        this.qps = qps;
        long currentTimeMillis = System.currentTimeMillis();
        for (int i = 0; i < windowArray.length; i++) {
            windowArray[i] = new WindowInfo(currentTimeMillis, new AtomicInteger(0));
        }
    }

    public synchronized boolean tryAcquire() {
        long currentTimeMillis = System.currentTimeMillis();
        int currentIndex = (int) (currentTimeMillis % windowSize / (windowSize / windowCount));
        int sum = 0;
        for (int i = 0; i < windowArray.length; i++) {
            WindowInfo windowInfo = windowArray[i];
            if ((currentTimeMillis - windowInfo.getTime()) > windowSize) {
                windowInfo.getNumber().set(0);
                windowInfo.setTime(currentTimeMillis);
            }
            if (currentIndex == i && windowInfo.getNumber().get() < qps) {
                windowInfo.getNumber().incrementAndGet();
            }
            sum += windowInfo.getNumber().get();
        }
        return sum <= qps;
    }

    @Data
    private class WindowInfo {
        private Long time;
        private AtomicInteger number;
        public WindowInfo(long time, AtomicInteger number) {
            this.time = time;
            this.number = number;
        }
    }

    public static void main(String[] args) throws InterruptedException {
        int qps = 2, count = 20, sleep = 300;
        System.out.println(String.format("Current QPS limit: %d, test count: %d, interval: %dms", qps, count, sleep));
        SlidingWindow limiter = new SlidingWindow(qps);
        int success = 0;
        for (int i = 0; i < count; i++) {
            Thread.sleep(sleep);
            if (limiter.tryAcquire()) {
                success++;
                System.out.println(LocalTime.now() + ": success");
            } else {
                System.out.println(LocalTime.now() + ": fail");
            }
        }
        System.out.println("Actual successful attempts: " + success);
    }
}

Leaky Bucket Algorithm

The leaky bucket treats incoming requests as water filling a bucket; water leaks out at a constant rate. If incoming rate exceeds the leak rate, excess water overflows and requests are rejected, protecting the system from sudden traffic bursts.

Code Example:

package com.example.studyproject.algorithm;

import java.time.LocalTime;
import java.util.LinkedList;
import java.util.Queue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

/**
 * @ClassName: LeakyBucket
 * @Description: Leaky bucket algorithm
 */
public class LeakyBucket {
    private final int bucket;
    private int qps;
    private long water;
    private long timeStamp = System.currentTimeMillis();

    public LeakyBucket(int bucket, int qps) {
        this.bucket = bucket;
        this.qps = qps;
    }

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();
        long timeGap = (now - timeStamp) / 1000;
        water = Math.max(0, water - timeGap * qps);
        timeStamp = now;
        if (water < bucket) {
            water += 1;
            return true;
        }
        return false;
    }

    public static void main(String[] args) throws InterruptedException {
        ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
        ExecutorService singleThread = Executors.newSingleThreadExecutor();
        LeakyBucket rateLimiter = new LeakyBucket(20, 2);
        Queue<Integer> queue = new LinkedList<>();
        singleThread.execute(() -> {
            int count = 0;
            while (true) {
                count++;
                boolean allowed = rateLimiter.tryAcquire();
                if (allowed) {
                    queue.offer(count);
                    System.out.println(count + "--------flow allowed--------");
                } else {
                    System.out.println(count + " flow limited");
                }
                try { Thread.sleep((long) (Math.random() * 1000)); } catch (InterruptedException e) { e.printStackTrace(); }
            }
        });
        scheduler.scheduleAtFixedRate(() -> {
            if (!queue.isEmpty()) {
                System.out.println(queue.poll() + " processed");
            }
        }, 0, 100, TimeUnit.MILLISECONDS);
        while (true) { Thread.sleep(10000); }
    }
}

Token Bucket Algorithm

The token bucket works like a hospital registration system: tokens are added to a bucket at a constant rate; a request must acquire a token before proceeding. If no tokens are available, the request is rejected.

Tokens are produced at a fixed frequency (e.g., QPS = 2 → one token every 500 ms).

Requests consume tokens; if the bucket is empty, the request is denied.

The bucket capacity equals the rate‑limit threshold, allowing burst traffic up to that limit.

Code Example (using Guava RateLimiter):

// Using Google Guava's token bucket RateLimiter
public static void main(String[] args) throws InterruptedException {
    RateLimiter rateLimiter = RateLimiter.create(2);
    for (int i = 0; i < 10; i++) {
        String time = LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_TIME);
        System.out.println(time + ":" + rateLimiter.tryAcquire());
        Thread.sleep(250);
    }
}

Other Rate Limiting Approaches

Guava library implementation

AOP‑based rate limiting

Redis‑based distributed rate limiting

Redisson for distributed environments

Sentinel for distributed rate limiting

Nginx/Gateway rate limiting

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend performance algorithm

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.