Rate Limiting Demystified: Token Bucket, Leaky Bucket & Counter Algorithms in Java

During high‑traffic scenarios, services can become unavailable, so implementing rate‑limiting techniques like token bucket, leaky bucket, and counter algorithms—illustrated with Java code examples using Guava RateLimiter, AtomicInteger, and Semaphore—helps smooth bursts, control concurrency, and prevent system overload.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Rate Limiting Demystified: Token Bucket, Leaky Bucket & Counter Algorithms in Java

Overview

When a system faces massive concurrent requests, services or interfaces may become unavailable, potentially causing a cascade failure that crashes the entire system. Rate limiting is a common technique to mitigate this problem by limiting the number of requests that can be processed at a given time.

Rate Limiting Algorithms

The three most widely used rate‑limiting algorithms are Token Bucket, Leaky Bucket, and Counter.

1. Token Bucket Algorithm

The token bucket algorithm adds tokens to a bucket at a constant rate. A request can only be processed if it successfully takes a token from the bucket; otherwise the request is rejected. When the bucket is full, newly generated tokens are discarded.

Token Bucket Example (Guava RateLimiter):

public class RateLimiterDemo {
    private static RateLimiter limiter = RateLimiter.create(5);
    public static void exec() {
        limiter.acquire(1);
        try {
            // core logic
            TimeUnit.SECONDS.sleep(1);
            System.out.println("--" + System.currentTimeMillis() / 1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

Guava's RateLimiter implements the token bucket algorithm. In the example, five tokens are generated per second, i.e., one token every 200 ms. limiter.acquire() consumes a token; if none are available, the call blocks until a token becomes available.

2. Leaky Bucket Algorithm

The leaky bucket algorithm controls the rate at which data is injected into a network, smoothing burst traffic. Data can arrive at any speed into the bucket, but the bucket releases data at a constant rate. If the bucket overflows, excess data is discarded.

3. Counter Rate Limiting Algorithm

This algorithm limits the total number of concurrent operations, such as the size of a database connection pool, thread pool, or API request count.

Counter Example 1 (AtomicInteger):

public class CountRateLimiterDemo1 {
    private static AtomicInteger count = new AtomicInteger(0);
    public static void exec() {
        if (count.get() >= 5) {
            System.out.println("Too many requests, please try later!" + System.currentTimeMillis() / 1000);
        } else {
            count.incrementAndGet();
            try {
                // core logic
                TimeUnit.SECONDS.sleep(1);
                System.out.println("--" + System.currentTimeMillis() / 1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            } finally {
                count.decrementAndGet();
            }
        }
    }
}

AtomicInteger tracks the current number of concurrent executions; when the threshold is exceeded, the request is rejected with a simple “system busy” message.

Drawback: Using AtomicInteger to reject requests outright may block legitimate short‑lived spikes.

Counter Example 2 (Semaphore):

public class CountRateLimiterDemo2 {
    private static Semaphore semaphore = new Semaphore(5);
    public static void exec() {
        if (semaphore.getQueueLength() > 100) {
            System.out.println("Queue length exceeds 100, please try later...");
        }
        try {
            semaphore.acquire();
            // core logic
            TimeUnit.SECONDS.sleep(1);
            System.out.println("--" + System.currentTimeMillis() / 1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            semaphore.release();
        }
    }
}

Semaphore limits the number of concurrent executions. When the waiting queue becomes too long, the request can be rejected, providing a smoother throttling effect compared to the atomic counter.

Advantage: For short‑lived spikes, requests are queued instead of being rejected immediately, achieving traffic smoothing.

Source: https://www.cnblogs.com/java1024/p/7725632.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Guavarate limitingToken Bucketleaky bucketcounter algorithm
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.