Backend Development 15 min read

Java Rate Limiting: Fixed, Sliding, Leaky & Token Bucket Algorithms Explained

This article introduces the concept of rate limiting, explains three core algorithms—fixed window, sliding window, and leaky bucket—along with the token bucket approach, provides Java code examples for each, discusses their principles, advantages, and pitfalls, and outlines practical implementation considerations.

Java High-Performance Architecture

Dec 12, 2022

Java Rate Limiting: Fixed, Sliding, Leaky & Token Bucket Algorithms Explained

Implementation of Rate Limiting

Rate limiting restricts the number of requests within a given time window to maintain system availability and stability, preventing performance degradation or crashes caused by traffic spikes.

Fixed Window (Counter) Rate Limiting

Principle

The timeline is divided into multiple independent fixed-size windows.

Each request that falls into a window increments a counter.

If the counter exceeds the threshold, subsequent requests in that window are rejected; the counter resets to zero when the next window starts.

Example:

package com.example.studyproject.algorithm;

import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Fixed window algorithm
 */
public class FixedWindow {

    // threshold
    private static Integer QPS = 2;

    // time window (ms)
    private static long TIME_WINDOWS = 1000;

    // request counter
    private static AtomicInteger REQ_COUNT = new AtomicInteger();

    // window start time
    private static long START_TIME = System.currentTimeMillis();

    public synchronized static boolean tryAcquire() {
        // reset window if timeout
        if ((System.currentTimeMillis() - START_TIME) > TIME_WINDOWS) {
            REQ_COUNT.set(0);
            START_TIME = System.currentTimeMillis();
        }
        return REQ_COUNT.incrementAndGet() <= QPS;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 10; i++) {
            Thread.sleep(250);
            LocalTime now = LocalTime.now();
            if (!tryAcquire()) {
                System.out.println(now + " 被限流");
            } else {
                System.out.println(now + " 做点什么");
            }
        }
    }
}

Problem: When the time window straddles a boundary (e.g., the last 500 ms of one second and the first 500 ms of the next), four requests can be processed even though the overall QPS is set to 2.

Sliding Window Rate Limiting

The sliding window algorithm improves upon the fixed window approach.

Principle:

The total time interval is divided into many small sub‑intervals.

Each sub‑interval has its own counter; a request falling into a sub‑interval increments that counter.

When the window slides forward, the oldest sub‑interval is discarded and a new one is added.

The sum of all counters within the current window determines whether the request should be allowed.

The shorter the slide interval, the lower the probability of the boundary‑crossing issue, though it can still occur.

package com.example.studyproject.algorithm;

import lombok.Data;
import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Sliding window algorithm
 */
public class SlidingWindow {

    private int qps = 2;
    private long windowSize = 1000;
    private Integer windowCount = 10;
    private WindowInfo[] windowArray = new WindowInfo[windowCount];

    public SlidingWindow(int qps) {
        this.qps = qps;
        long currentTimeMillis = System.currentTimeMillis();
        for (int i = 0; i < windowArray.length; i++) {
            windowArray[i] = new WindowInfo(currentTimeMillis, new AtomicInteger(0));
        }
    }

    public synchronized boolean tryAcquire() {
        long currentTimeMillis = System.currentTimeMillis();
        int currentIndex = (int) (currentTimeMillis % windowSize / (windowSize / windowCount));
        int sum = 0;
        for (int i = 0; i < windowArray.length; i++) {
            WindowInfo windowInfo = windowArray[i];
            if ((currentTimeMillis - windowInfo.getTime()) > windowSize) {
                windowInfo.getNumber().set(0);
                windowInfo.setTime(currentTimeMillis);
            }
            if (currentIndex == i && windowInfo.getNumber().get() < qps) {
                windowInfo.getNumber().incrementAndGet();
            }
            sum += windowInfo.getNumber().get();
        }
        return sum <= qps;
    }

    @Data
    private class WindowInfo {
        private Long time;
        private AtomicInteger number;

        public WindowInfo(long time, AtomicInteger number) {
            this.time = time;
            this.number = number;
        }
    }

    public static void main(String[] args) throws InterruptedException {
        int qps = 2, count = 20, sleep = 300, success = count * sleep / 1000 * qps;
        System.out.println(String.format("Current QPS limit:%d, test count:%d, interval:%dms, expected success:%d", qps, count, sleep, success));
        success = 0;
        SlidingWindow myRateLimiter = new SlidingWindow(qps);
        for (int i = 0; i < count; i++) {
            Thread.sleep(sleep);
            if (myRateLimiter.tryAcquire()) {
                success++;
                if (success % qps == 0) {
                    System.out.println(LocalTime.now() + ": success, ");
                } else {
                    System.out.print(LocalTime.now() + ": success, ");
                }
            } else {
                System.out.println(LocalTime.now() + ": fail");
            }
        }
        System.out.println();
        System.out.println("Actual successful count:" + success);
    }
}

Output:

已连接到目标 VM, 地址: '127.0.0.1:50101', 传输: '套接字'
当前QPS限制为:2,当前测试次数:20,间隔:300ms,预计成功次数:12
14:20:38.833: success, 14:20:39.142: success,
14:20:39.455: success, 14:20:39.766: success,
14:20:40.077: fail
14:20:40.377: fail
14:20:40.678: success, 14:20:40.992: success,
14:20:41.307: fail
14:20:41.621: fail
14:20:41.922: success, 14:20:42.229: success,
14:20:42.539: fail
14:20:42.840: fail
14:20:43.140: success, 14:20:43.455: success,
14:20:43.756: fail
14:20:44.070: fail
14:20:44.386: success, 14:20:44.687: success,

实际测试成功次数:12

Leaky Bucket Algorithm

The leaky bucket visualizes requests as water flowing into a bucket with a fixed capacity; water leaks out at a constant rate. If incoming rate exceeds the leak rate, excess water overflows and requests are rejected, protecting the system from sudden traffic bursts.

The consumption in a leaky bucket proceeds at a steady pace, which safeguards the system, but it also means that simultaneous requests cannot be processed concurrently.

package com.example.studyproject.algorithm;

import java.time.LocalTime;
import java.util.LinkedList;
import java.util.Queue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

/**
 * Leaky bucket algorithm
 */
public class LeakyBucket {

    private final int bucket;
    private int qps;
    private long water;
    private long timeStamp = System.currentTimeMillis();

    public LeakyBucket(int bucket, int qps) {
        this.bucket = bucket;
        this.qps = qps;
    }

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();
        long timeGap = (now - timeStamp) / 1000;
        water = Math.max(0, water - timeGap * qps);
        timeStamp = now;
        if (water < bucket) {
            water += 1;
            return true;
        }
        return false;
    }

    public static void main(String[] args) throws InterruptedException {
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
        ExecutorService singleThread = Executors.newSingleThreadExecutor();

        LeakyBucket rateLimiter = new LeakyBucket(20, 2);
        Queue<Integer> queue = new LinkedList<>();

        singleThread.execute(() -> {
            int count = 0;
            while (true) {
                count++;
                boolean flag = rateLimiter.tryAcquire();
                if (flag) {
                    queue.offer(count);
                    System.out.println(count + "--------流量被放行--------");
                } else {
                    System.out.println(count + "流量被限制");
                }
                try {
                    Thread.sleep((long) (Math.random() * 1000));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });

        scheduledExecutorService.scheduleAtFixedRate(() -> {
            if (!queue.isEmpty()) {
                System.out.println(queue.poll() + "被处理");
            }
        }, 0, 100, TimeUnit.MILLISECONDS);

        while (true) {
            Thread.sleep(10000);
        }
    }
}

Token Bucket Algorithm

The token bucket works like a hospital registration system: a token is required before a request can be processed. Tokens are added to the bucket at a constant rate; if the bucket is empty, incoming requests are rejected or delayed.

The producer (system) adds tokens at a fixed frequency (e.g., QPS = 2 → one token every 500 ms).

The consumer (request) must acquire a token before proceeding; if none are available, the request is denied or delayed.

The bucket capacity equals the rate‑limit threshold, allowing short bursts of traffic.

There is an initialization phase where the bucket starts empty.

Code Example (Google Guava RateLimiter):

import com.google.common.util.concurrent.RateLimiter;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;

public class TokenBucketDemo {
    public static void main(String[] args) throws InterruptedException {
        RateLimiter rateLimiter = RateLimiter.create(2); // 2 permits per second
        for (int i = 0; i < 10; i++) {
            String time = LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_TIME);
            System.out.println(time + ":" + rateLimiter.tryAcquire());
            Thread.sleep(250);
        }
    }
}

Other Rate‑Limiting Implementations

Guava based rate limiting

AOP based rate limiting

Redis based (distributed)

Redisson based (distributed)

Sentinel (distributed)

Nginx / Gateway (distributed)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Distributed Systems java performance algorithm rate limiting

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.