Mastering Guava RateLimiter: Token Bucket Algorithm Explained with Java Examples

This article introduces the fundamentals of rate limiting, compares common algorithms such as counter, leaky bucket, and token bucket, delves into Guava's RateLimiter design and its SmoothBursty and SmoothWarmingUp implementations, and provides practical Java code examples for quick integration.

CoolHome R&D Department
CoolHome R&D Department
CoolHome R&D Department
Mastering Guava RateLimiter: Token Bucket Algorithm Explained with Java Examples

0. Background

A business requirement limits QPS, so various rate‑limiting schemes were investigated and Guava's RateLimiter was studied in depth. This article focuses on the design philosophy and usage of RateLimiter; source‑code analysis is covered in a separate article.

1. Common Rate‑Limiting Schemes

Counter method : Count requests within a fixed time window; if the count exceeds the limit, requests are throttled. The counter resets each window, which can cause spikes at window boundaries. Reducing the window size improves precision.

Leaky bucket algorithm : Requests fill a bucket of fixed capacity; the bucket drains at a constant rate. Excess requests are dropped when the bucket is full. It smooths traffic but the drain rate can become a bottleneck.

Token bucket algorithm : Tokens are added to a bucket at a constant rate. A request must acquire a token; if none are available, the request is rejected. This allows bursts while respecting the overall QPS limit.

2. Token Bucket Algorithm

The token bucket originated in computer networking to prevent congestion by shaping traffic. It is widely used for traffic shaping, rate limiting, and can also mitigate DDoS attacks when applied in reverse‑proxy servers such as Nginx.

Guava implements the token bucket via SmoothBursty and SmoothWarmingUp classes. SmoothBursty permits a limited burst of traffic, while SmoothWarmingUp starts with a slower rate and gradually ramps up to the target QPS.

3. Example at the Beginning

An illustrative scenario assumes a maximum of 5 requests per second (one request every 0.2 s). The example walks through token acquisition at different timestamps, showing how stored tokens accumulate during idle periods and how bursts are handled.

4. Design Philosophy of RateLimiter

RateLimiter

aims to provide a stable request rate. It records the *expected* time of the next request rather than the time of the last request, enabling precise calculation of waiting time and stored permits.

When the limiter is idle, a variable “tokens” grows up to the maximum stored permits. Upon a request, tokens are drawn from the stored pool first, then from newly generated tokens. The core function

storedPermitsToWaitTime(double storedPermits, double permitsToTake)

maps stored permits to the required wait time via a continuous integral.

The relationship between stored permits and wait time can be visualized as an integral curve. A flat line at 1/rate yields no extra waiting (burst mode), while a curve below the line shortens wait time after idle periods, and a curve above lengthens it.

Images illustrating the rate‑vs‑permits relationship:

When the limiter’s QPS is set to 1, a large acquire(100) call does not wait 100 seconds; instead, the task can start immediately while tokens are produced in the background.

5. Quick Usage

Creating a simple limiter and acquiring permits:

public static void main() {
    RateLimiter limiter = RateLimiter.create(5);
    for (int i = 0; i < 5; i++) {
        System.out.println(limiter.acquire());
    }
}

Output shows the waiting time for each acquire call (the first call returns 0.0). The acquire() method returns the time the thread waited before the request could proceed.

Another example demonstrates acquiring multiple permits and the effect of warm‑up:

public static void main() {
    RateLimiter limiter = RateLimiter.create(5, 1000, TimeUnit.MILLISECONDS);
    for (int i = 1; i < 5; i++) {
        System.out.println(limiter.acquire());
    }
    Thread.sleep(1000L);
    for (int i = 1; i < 5; i++) {
        System.out.println(limiter.acquire());
    }
}

The output reflects the initial burst capacity (5 tokens) followed by a gradual return to the steady rate, which can be tuned via the warmupPeriod parameter.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaconcurrencyGuavarate limitingToken Bucket
CoolHome R&D Department
Written by

CoolHome R&D Department

Official account of CoolHome R&D Department, sharing technology and innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.