Understanding Rate Limiting: Leaky Bucket, Token Bucket, and Guava RateLimiter in Java
This article explains the principles of rate limiting, compares the leaky bucket and token bucket algorithms, and demonstrates how Guava's RateLimiter (including SmoothBursty and SmoothWarmingUp implementations) works in Java with detailed code examples and execution flow.
High‑concurrency systems often need to limit traffic to maintain responsiveness and stability; this is commonly called rate limiting or traffic shaping. The classic algorithms are the leaky bucket and the token bucket.
Leaky Bucket visualizes requests as water flowing into a bucket with a fixed outflow hole. If the inflow rate exceeds the outflow, the bucket fills and eventually overflows, causing excess requests to be dropped (a denial‑of‑service scenario). The method is simple but cannot handle short traffic spikes well.
Token Bucket stores tokens rather than requests. Tokens are generated at a constant rate up to a maximum bucket size; each request must consume a token, otherwise it is rejected. Because tokens can accumulate, the algorithm can absorb bursts of traffic until the token supply is exhausted, making it more flexible than the leaky bucket.
Guava provides a RateLimiter abstraction that implements token‑bucket based rate limiting. Its concrete implementation is SmoothRateLimiter, which has two internal subclasses:
SmoothBursty : a classic token bucket that allows bursts.
SmoothWarmingUp : a token bucket with a warm‑up period where token generation starts slowly and accelerates to a steady rate.
The most important fields of SmoothRateLimiter are: storedPermits: current number of tokens in the bucket. maxPermits: maximum bucket capacity. stableIntervalMicros: interval (in microseconds) between token generations. nextFreeTicketMicros: timestamp when the next token will be available.
Token generation is performed lazily in resync():
void resync(long nowMicros) {
if (nowMicros > nextFreeTicketMicros) {
double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();
storedPermits = Math.min(maxPermits, storedPermits + newPermits);
nextFreeTicketMicros = nowMicros;
}
}
// SmoothBursty.coolDownIntervalMicros()
@Override
double coolDownIntervalMicros() {
return stableIntervalMicros;
}When a request arrives, reserveEarliestAvailable() obtains tokens and computes any required wait time:
@Override
final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {
resync(nowMicros);
long returnValue = nextFreeTicketMicros;
double storedPermitsToSpend = Math.min(requiredPermits, this.storedPermits);
double freshPermits = requiredPermits - storedPermitsToSpend;
long waitMicros =
storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
+ (long) (freshPermits * stableIntervalMicros);
this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros);
this.storedPermits -= storedPermitsToSpend;
return returnValue;
}
// SmoothBursty.storedPermitsToWaitTime()
@Override
long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
return 0L;
}Creating a RateLimiter is straightforward; you only need to specify the desired permits per second (QPS). The internal maxBurstSeconds is fixed at 1, so the bucket capacity equals the QPS:
public static RateLimiter create(double permitsPerSecond) {
return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());
}
@VisibleForTesting
static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {
RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);
rateLimiter.setRate(permitsPerSecond);
return rateLimiter;
}
// maxPermits = maxBurstSeconds * permitsPerSecond;The public API offers acquire() (blocking until a token is available) and tryAcquire() (with timeout). Internally they call reserve() and reserveAndGetWaitLength() to compute the wait time:
@CanIgnoreReturnValue
public double acquire() {
return acquire(1);
}
@CanIgnoreReturnValue
public double acquire(int permits) {
long microsToWait = reserve(permits);
stopwatch.sleepMicrosUninterruptibly(microsToWait);
return 1.0 * microsToWait / SECONDS.toMicros(1L);
}
final long reserve(int permits) {
checkPermits(permits);
synchronized (mutex()) {
return reserveAndGetWaitLength(permits, stopwatch.readMicros());
}
}
final long reserveAndGetWaitLength(int permits, long nowMicros) {
long momentAvailable = reserveEarliestAvailable(permits, nowMicros);
return Math.max(momentAvailable - nowMicros, 0);
}Example usage:
public class RateLimiterExample {
public static void main(String[] args) throws Exception {
RateLimiter rateLimiter = RateLimiter.create(10);
Random random = new Random();
for (int i = 0; i < 20; i++) {
int numPermits = random.nextInt(20);
System.out.println(numPermits + "\t" + rateLimiter.acquire(numPermits));
}
}
}The program prints the number of permits requested and the time (in seconds) the limiter forced the thread to wait, confirming that the observed behavior matches the theoretical token‑bucket model.
Overall, the article provides a clear conceptual comparison of leaky‑bucket and token‑bucket algorithms, explains Guava's implementation details, and supplies practical Java code to help developers apply rate limiting in backend services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
