Backend Development 13 min read

Understanding Rate Limiting: Counter, Leaky Bucket, and Token Bucket Algorithms

This article explains the fundamentals of rate limiting for high‑concurrency systems, covering why it is needed, the conditions that trigger it, and detailed introductions to the three main algorithms—counter (fixed and sliding windows), leaky bucket, and token bucket—along with their advantages, drawbacks, and sample pseudo‑code implementations.

JD Retail Technology

May 25, 2020

Understanding Rate Limiting: Counter, Leaky Bucket, and Token Bucket Algorithms

High‑concurrency systems rely on three key tools—caching, rate limiting, and degradation—to stay stable; this article focuses on rate limiting, introducing its basic concepts and why it is essential for protecting services from overload.

Rate limiting restricts incoming traffic to keep request volume within a system's capacity, preventing scenarios such as bandwidth saturation (e.g., a 1 Gbps link overwhelmed by traffic) and ensuring smooth operation in contexts like banking queues, restaurant waitlists, or pandemic‑related medical systems.

Typical triggers for rate limiting include rapid user growth, sudden traffic spikes, aggressive crawlers, fraudulent order‑spamming, and attacks.

The three most common rate‑limiting algorithms are:

Counter algorithm (fixed‑window and sliding‑window variants)

Leaky bucket algorithm

Token bucket algorithm

Each algorithm suits different scenarios, and the following sections detail their principles, pros and cons, and provide pseudo‑code examples.

Counter Algorithm

The counter algorithm counts requests within a time unit; if the count exceeds a predefined threshold, requests are blocked. It has two implementations:

Fixed‑window: each time slot is independent; when a slot expires, counting restarts in the next slot.

Sliding‑window: the time window moves continuously, overlapping with previous windows, which mitigates the “burst” problem of fixed windows.

Fixed‑window logic diagram:

Fixed‑window suffers from the “critical” problem where requests may cluster at the edge of a window, causing a temporary overload that exceeds the system’s QPS limit.

Fixed‑window pseudo‑code:

<span><span>int</span> unitTime = <span>1</span>s <span>// set unit time to 1 second</span></span></code>
<code><span><span>int</span> limitCount = <span>1000</span> <span>// allow up to 1000 requests per unit time</span></span></code>
<code><span><span>string</span> limitKey = <span>'limitkey'</span>; <span>// key for storing rate‑limit state</span></span></code>
<code><span></span></code>
<code><span>if (!has(limitKey)) {</span></code>
<code><span>    // initialize state with expiration equal to unitTime (e.g., Redis SET)</span></code>
<code><span>    set(limitKey, 0, unitTime);</span></code>
<code><span>}</span></code>
<code><span></span></code>
<code><span>// atomically increment request count</span></code>
<code><span><span>int</span> counter = incr(limitKey, 1);</span></code>
<code><span></span></code>
<code><span>if (counter > limitCount) {</span></code>
<code><span>    // exceed limit, return 503</span></code>
<code><span>    return 503;</span></code>
<code><span>} else {</span></code>
<code><span>    continue; // proceed with business logic</span></code>
<code><span>}</span>

Sliding‑window refines the fixed‑window by dividing the unit time into smaller granules (e.g., 10 × 100 ms). The window slides forward with each request, so the counted interval continuously overlaps, reducing burst effects.

Sliding‑window incurs higher computational overhead because it must count requests over a moving interval.

Sliding‑window pseudo‑code:

<span><span>int</span> unitTime = <span>1000</span>ms <span>// 1 second</span></span></code>
<code><span><span>int</span> limitCount = <span>1000</span> <span>// max requests per unit time</span></span></code>
<code><span><span>string</span> listKey = <span>'limitkey'</span>; <span>// key for sorted set</span></span></code>
<code><span></span></code>
<code><span>// current timestamp in ms</span></code>
<code><span><span>int</span> curMilliseconds = nowMilliseconds();</span></code>
<code><span>// start of the sliding window</span></code>
<code><span><span>int</span> startMilliseconds = curMilliseconds - unitTime * <span>1000</span>;</span></code>
<code><span></span></code>
<code><span>// count requests in the window (e.g., Redis ZCOUNT)</span></code>
<code><span><span>int</span> counter = ZCOUNT(listKey, startMilliseconds, curMilliseconds);</span></code>
<code><span></span></code>
<code><span>if (counter > limitCount) {</span></code>
<code><span>    return 503;</span></code>
<code><span>} else {</span></code>
<code><span>    ZADD(listKey, curMilliseconds, uniqueId);</span></code>
<code><span>    continue;</span></code>
<code><span>}</span>

Leaky Bucket Algorithm

The leaky bucket treats incoming requests as water droplets poured into a fixed‑size FIFO queue (the bucket). The bucket drains at a constant rate; excess droplets are discarded, providing a smooth, average processing rate.

Leaky bucket pseudo‑code:

<span><span>// pseudo‑code implementation</span></span></code>
<code><span>local rate = <span>1</span>ms <span>// processing rate (1 request per ms)</span></span></code>
<code><span>local bucketSize = <span>1000</span> <span>// max number of droplets</span></span></code>
<code><span>local bucketQueue = <span>new</span> BucketQueue(bucketSize);</span></code>
<code><span>local userRequest = <span>new</span> UserRequest();</span></code>
<code><span>local res = bucketQueue.push(userRequest);</span></code>
<code><span>if (res == <span>false</span>) {</span></code>
<code><span>    return 503; // bucket full</span></code>
<code><span>} else {</span></code>
<code><span>    // request will be processed later</span></code>
<code><span>}</span></code>
<code><span>// consume the queue at fixed rate</span></code>
<code><span>setTimeout(function () {</span></code>
<code><span>    local userRequest = bucketQueue.lpop();</span></code>
<code><span>    // execute business logic</span></code>
<code><span>}, rate);</span>

Token Bucket Algorithm

The token bucket is similar to the leaky bucket but separates token production from token consumption. Tokens are generated at a steady rate and stored in a bucket; a request proceeds only if it can acquire a token, ensuring a controlled request rate while allowing bursts up to the bucket capacity.

Token bucket pseudo‑code:

<span><span>// pseudo‑code implementation</span></span></code>
<code><span>local tokenProducerRate = <span>1</span>ms <span>// token generation rate</span></span></code>
<code><span>local bucketSize = <span>1000</span> <span>// max tokens</span></span></code>
<code><span>local bucketKey = <span>"bucketKey"</span>;</span></code>
<code><span>bucketKey.setSize(bucketSize);</span></code>
<code><span>local token = bucketKey.getValidToken();</span></code>
<code><span>if (token ~= <span>false</span>) {</span></code>
<code><span>    token.lock(); // ensure exclusive use</span></code>
<code><span>} else {</span></code>
<code><span>    return 503; // no token available</span></code>
<code><span>}</span></code>
<code><span>// continue business logic</span></code>
<code><span>token.destroy(); // release token after use</span></code>
<code><span>// token producer (adds a token every 1 ms)</span></code>
<code><span>setTimeout(function () {</span></code>
<code><span>    local token = <span>new</span> Token();</span></code>
<code><span>    local result = bucketKey.push(token);</span></code>
<code><span>    // ignore push result; if bucket full, token is discarded</span></code>
<code><span>}, tokenProducerRate);</span>

Algorithm Comparison

Each algorithm has its own strengths and trade‑offs; selecting the appropriate one depends on the specific business scenario, and hybrid approaches can be employed to satisfy complex requirements.

Note: This article focuses on the theoretical foundations and implementation sketches; future posts will cover practical rate‑limiting configurations in Nginx and Google Guava.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend rate limiting Token Bucket leaky bucket counter algorithm

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.