Understanding Rate Limiting: Leaky Bucket, Token Bucket, and Guava RateLimiter
The article explains the principles of traffic shaping through leaky‑bucket and token‑bucket algorithms, details how Google Guava's RateLimiter implements token‑bucket rate limiting, and provides Java code examples illustrating token generation, acquisition, and practical usage in high‑concurrency backend systems.
High‑concurrency business systems often need to limit risky interfaces to maintain responsiveness and stability, a technique known as rate limiting or traffic shaping. The classic algorithms are the leaky bucket and the token bucket.
Leaky Bucket visualizes requests as water flowing into a bucket with a fixed outflow hole; if the inflow rate exceeds the outflow, the bucket overflows and excess requests are dropped, effectively causing a denial of service.
Token Bucket stores tokens (permits) instead of requests. Tokens are generated at a constant rate up to a maximum bucket size; a request can proceed only if it obtains a token, allowing bursty traffic to be smoothed while preserving flexibility.
In Spark Streaming, rate limiting is implemented via Guava's RateLimiter, which uses a token‑bucket algorithm. The core abstract class RateLimiter is implemented by SmoothRateLimiter, which has two strategies: SmoothBursty (standard token bucket) and SmoothWarmingUp (token generation with a warm‑up period).
The key internal fields of SmoothRateLimiter are: storedPermits: current number of tokens in the bucket. maxPermits: maximum capacity of the bucket. stableIntervalMicros: interval (in microseconds) between token generations. nextFreeTicketMicros: timestamp when the next token will be available.
Token generation is performed lazily in resync(long nowMicros):
void resync(long nowMicros) {</code><code> if (nowMicros > nextFreeTicketMicros) {</code><code> double newPermits = (nowMicros - nextFreeTicketMicros) / coolDownIntervalMicros();</code><code> storedPermits = min(maxPermits, storedPermits + newPermits);</code><code> nextFreeTicketMicros = nowMicros;</code><code> }</code><code>}</code><code>double coolDownIntervalMicros() {</code><code> return stableIntervalMicros;</code><code>}When a request arrives,
reserveEarliestAvailable(int requiredPermits, long nowMicros)obtains tokens:
final long reserveEarliestAvailable(int requiredPermits, long nowMicros) {</code><code> resync(nowMicros);</code><code> long returnValue = nextFreeTicketMicros;</code><code> double storedPermitsToSpend = min(requiredPermits, this.storedPermits);</code><code> double freshPermits = requiredPermits - storedPermitsToSpend;</code><code> long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)</code><code> + (long) (freshPermits * stableIntervalMicros);</code><code> this.nextFreeTicketMicros = LongMath.saturatedAdd(nextFreeTicketMicros, waitMicros);</code><code> this.storedPermits -= storedPermitsToSpend;</code><code> return returnValue;</code><code>}</code><code>long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {</code><code> return 0L;</code><code>}The public API provides acquire() and tryAcquire(). Internally, acquire() calls reserve(), which eventually invokes the above method and blocks the calling thread until a token is available.
@CanIgnoreReturnValue public double acquire() { return acquire(1); }</code><code>@CanIgnoreReturnValue public double acquire(int permits) {</code><code> long microsToWait = reserve(permits);</code><code> stopwatch.sleepMicrosUninterruptibly(microsToWait);</code><code> return 1.0 * microsToWait / SECONDS.toMicros(1L);</code><code>}</code><code>final long reserve(int permits) {</code><code> checkPermits(permits);</code><code> synchronized (mutex()) {</code><code> return reserveAndGetWaitLength(permits, stopwatch.readMicros());</code><code> }</code><code>}</code><code>final long reserveAndGetWaitLength(int permits, long nowMicros) {</code><code> long momentAvailable = reserveEarliestAvailable(permits, nowMicros);</code><code> return Math.max(momentAvailable - nowMicros, 0);</code><code>}Creating a RateLimiter is straightforward:
public static RateLimiter create(double permitsPerSecond) {</code><code> return create(permitsPerSecond, SleepingStopwatch.createFromSystemTimer());</code><code>}</code><code>@VisibleForTesting static RateLimiter create(double permitsPerSecond, SleepingStopwatch stopwatch) {</code><code> RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0);</code><code> rateLimiter.setRate(permitsPerSecond);</code><code> return rateLimiter;</code><code>}Example usage:
public class RateLimiterExample {</code><code> public static void main(String[] args) throws Exception {</code><code> RateLimiter rateLimiter = RateLimiter.create(10);</code><code> Random random = new Random();</code><code> for (int i = 0; i < 20; i++) {</code><code> int numPermits = random.nextInt(20);</code><code> System.out.println(numPermits + "\t" + rateLimiter.acquire(numPermits));</code><code> }</code><code> }</code><code>}The program prints the number of requested permits and the time (in seconds) the caller had to wait, demonstrating that the token‑bucket logic correctly throttles bursty traffic according to the configured QPS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
