Mastering API Rate Limiting with Token Bucket and Redis

This article explains how to protect API servers from abuse by implementing rate limiting using the Token Bucket algorithm, discusses its drawbacks, and provides practical Java and Redis-based solutions with code examples and performance considerations.

21CTO
21CTO
21CTO
Mastering API Rate Limiting with Token Bucket and Redis

When developing API servers, it is common to limit the number of calls a client can make within a certain time interval to protect server resources.

For example, a user may be allowed no more than 100 requests per minute; exceeding this limit results in a rejected request.

Initial Idea

The first impression is to give each user a quota Q that resets after interval P. The pseudo‑code is:

can_access(identity):
    limit_counter = get(identity)
    if limiter_counter exists and
       limiter_counter.timestamp - CURRENT_TIME < LIMIT_INTERVAL and
       limit_counter >= limit:
        return false
    else
        if limiter_counter is nil or
           (limit_counter exists and limiter_counter.timestamp - CURRENT_TIME > LIMIT_INTERVAL):
            put(identity, new LimiterCounter(1, CURRENT_TIME))
        else
            put(identity, limiter_counter.increment())
        end
        return true
    end
end

Redis's INCR command can implement this simple approach, but it has a flaw: a user can send Q requests at the end of one interval and another Q at the start of the next, effectively making 2Q requests in a short time, which can double the server load and be exploitable.

Token Bucket Algorithm

After searching, the Token Bucket algorithm was found, commonly used in networking to limit traffic.

Key points of the algorithm:

All traffic must acquire a token before being allowed.

Tokens are added to a bucket at a fixed rate (1/r seconds per token).

The bucket has a maximum capacity; excess tokens are discarded.

The algorithm can be visualized as a faucet filling a bucket; when the bucket is full, extra water overflows.

In our context, “traffic” becomes API requests. Each request determines which bucket’s token it needs, allowing multiple buckets for different limits (e.g., per‑user, per‑IP, global).

Example: a logged‑in user may have 200 requests per 10 seconds, 5,000 per hour, and 20,000 per day, requiring three separate buckets.

The entire service may have a global bucket limiting all requests to 100,000 per 10 seconds.

IP‑based buckets can also be created.

If a token cannot be acquired, the server should return HTTP 429; otherwise the request proceeds.

Initial Implementation Idea

A naive implementation would set a timer for each bucket to add tokens, but this would require a thread per bucket, which is impractical at scale.

Alternative Implementation

Guava’s RateLimiter implements a Token Bucket using timestamps instead of timers. The core calculation is:

// com.google.common.util.concurrent.SmoothRateLimiter
private void resync(long nowMicros) {
    // if nextFreeTicket is in the past, resync to now
    if (nowMicros > nextFreeTicketMicros) {
        storedPermits = min(maxPermits,
            storedPermits + (nowMicros - nextFreeTicketMicros) / stableIntervalMicros);
        nextFreeTicketMicros = nowMicros;
    }
}

We can store the current token count and the timestamp of the last refill in a bucket, and compute the number of tokens to add on each request.

Advantages:

Avoids a timer per bucket.

Memory usage is minimal (only token count and timestamp).

Computation occurs only on request, reducing overhead.

Buckets can be stored in Redis as hashes with a TTL, making them act like cache entries that expire when unused.

First Java Implementation Using Redis

public boolean access(String userId) {
    String key = genKey(userId);
    try (Jedis jedis = jedisPool.getResource()) {
        Map<String, String> counter = jedis.hgetAll(key);
        if (counter.size() == 0) {
            TokenBucket tokenBucket = new TokenBucket(System.currentTimeMillis(), limit - 1);
            jedis.hmset(key, tokenBucket.toHash());
            return true;
        } else {
            TokenBucket tokenBucket = TokenBucket.fromHash(counter);
            long lastRefillTime = tokenBucket.getLastRefillTime();
            long refillTime = System.currentTimeMillis();
            long intervalSinceLast = refillTime - lastRefillTime;
            long currentTokensRemaining;
            if (intervalSinceLast > intervalInMills) {
                currentTokensRemaining = limit;
            } else {
                long grantedTokens = (long) (intervalSinceLast / intervalPerPermit);
                System.out.println(grantedTokens);
                currentTokensRemaining = Math.min(grantedTokens + tokenBucket.getTokensRemaining(), limit);
            }
            tokenBucket.setLastRefillTime(refillTime);
            assert currentTokensRemaining >= 0;
            if (currentTokensRemaining == 0) {
                tokenBucket.setTokensRemaining(currentTokensRemaining);
                jedis.hmset(key, tokenBucket.toHash());
                return false;
            } else {
                tokenBucket.setTokensRemaining(currentTokensRemaining - 1);
                jedis.hmset(key, tokenBucket.toHash());
                return true;
            }
        }
    }
}

This method works in single‑threaded scenarios but suffers from race conditions under concurrency.

Improved Implementation with Lua Scripts

To avoid race conditions, Redis Lua scripts ( EVAL / EVALSHA) can execute the token‑bucket logic atomically, eliminating the need for explicit locks.

The Lua script and its Java wrapper ( LuaRateLimiter) encapsulate the read‑modify‑write sequence in a single atomic operation.

Refinements and Clean‑up

Two issues were identified:

When a bucket is first used or idle for a long time, the initial token count should be configurable rather than always the maximum capacity.

The LuaRateLimiter class handled both token acquisition and policy definition, violating the Single Responsibility Principle. It was split into RateLimitPolicy and LuaRateLimiter.

The updated design introduces maxBurstTime in the policy to calculate initial tokens and separates concerns for better maintainability.

Author: Achilles and the Tortoise Source: http://yigwoo.com/post/api-throttle
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaalgorithmredisrate limitingToken Bucket
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.