Understanding Rate Limiting: Counter, Leaky Bucket, and Token Bucket Algorithms
This article explains the fundamentals of rate limiting for high‑concurrency systems, covering why it is needed, the conditions that trigger it, and detailed introductions to the three main algorithms—counter (fixed and sliding windows), leaky bucket, and token bucket—along with their advantages, drawbacks, and sample pseudo‑code implementations.
High‑concurrency systems rely on three key tools—caching, rate limiting, and degradation—to stay stable; this article focuses on rate limiting, introducing its basic concepts and why it is essential for protecting services from overload.
Rate limiting restricts incoming traffic to keep request volume within a system's capacity, preventing scenarios such as bandwidth saturation (e.g., a 1 Gbps link overwhelmed by traffic) and ensuring smooth operation in contexts like banking queues, restaurant waitlists, or pandemic‑related medical systems.
Typical triggers for rate limiting include rapid user growth, sudden traffic spikes, aggressive crawlers, fraudulent order‑spamming, and attacks.
The three most common rate‑limiting algorithms are:
Counter algorithm (fixed‑window and sliding‑window variants)
Leaky bucket algorithm
Token bucket algorithm
Each algorithm suits different scenarios, and the following sections detail their principles, pros and cons, and provide pseudo‑code examples.
Counter Algorithm
The counter algorithm counts requests within a time unit; if the count exceeds a predefined threshold, requests are blocked. It has two implementations:
Fixed‑window: each time slot is independent; when a slot expires, counting restarts in the next slot.
Sliding‑window: the time window moves continuously, overlapping with previous windows, which mitigates the “burst” problem of fixed windows.
Fixed‑window logic diagram:
Fixed‑window suffers from the “critical” problem where requests may cluster at the edge of a window, causing a temporary overload that exceeds the system’s QPS limit.
Fixed‑window pseudo‑code:
<span><span>int</span> unitTime = <span>1</span>s <span>// set unit time to 1 second</span></span></code>
<code><span><span>int</span> limitCount = <span>1000</span> <span>// allow up to 1000 requests per unit time</span></span></code>
<code><span><span>string</span> limitKey = <span>'limitkey'</span>; <span>// key for storing rate‑limit state</span></span></code>
<code><span></span></code>
<code><span>if (!has(limitKey)) {</span></code>
<code><span> // initialize state with expiration equal to unitTime (e.g., Redis SET)</span></code>
<code><span> set(limitKey, 0, unitTime);</span></code>
<code><span>}</span></code>
<code><span></span></code>
<code><span>// atomically increment request count</span></code>
<code><span><span>int</span> counter = incr(limitKey, 1);</span></code>
<code><span></span></code>
<code><span>if (counter > limitCount) {</span></code>
<code><span> // exceed limit, return 503</span></code>
<code><span> return 503;</span></code>
<code><span>} else {</span></code>
<code><span> continue; // proceed with business logic</span></code>
<code><span>}</span>Sliding‑window refines the fixed‑window by dividing the unit time into smaller granules (e.g., 10 × 100 ms). The window slides forward with each request, so the counted interval continuously overlaps, reducing burst effects.
Sliding‑window incurs higher computational overhead because it must count requests over a moving interval.
Sliding‑window pseudo‑code:
<span><span>int</span> unitTime = <span>1000</span>ms <span>// 1 second</span></span></code>
<code><span><span>int</span> limitCount = <span>1000</span> <span>// max requests per unit time</span></span></code>
<code><span><span>string</span> listKey = <span>'limitkey'</span>; <span>// key for sorted set</span></span></code>
<code><span></span></code>
<code><span>// current timestamp in ms</span></code>
<code><span><span>int</span> curMilliseconds = nowMilliseconds();</span></code>
<code><span>// start of the sliding window</span></code>
<code><span><span>int</span> startMilliseconds = curMilliseconds - unitTime * <span>1000</span>;</span></code>
<code><span></span></code>
<code><span>// count requests in the window (e.g., Redis ZCOUNT)</span></code>
<code><span><span>int</span> counter = ZCOUNT(listKey, startMilliseconds, curMilliseconds);</span></code>
<code><span></span></code>
<code><span>if (counter > limitCount) {</span></code>
<code><span> return 503;</span></code>
<code><span>} else {</span></code>
<code><span> ZADD(listKey, curMilliseconds, uniqueId);</span></code>
<code><span> continue;</span></code>
<code><span>}</span>Leaky Bucket Algorithm
The leaky bucket treats incoming requests as water droplets poured into a fixed‑size FIFO queue (the bucket). The bucket drains at a constant rate; excess droplets are discarded, providing a smooth, average processing rate.
Leaky bucket pseudo‑code:
<span><span>// pseudo‑code implementation</span></span></code>
<code><span>local rate = <span>1</span>ms <span>// processing rate (1 request per ms)</span></span></code>
<code><span>local bucketSize = <span>1000</span> <span>// max number of droplets</span></span></code>
<code><span>local bucketQueue = <span>new</span> BucketQueue(bucketSize);</span></code>
<code><span>local userRequest = <span>new</span> UserRequest();</span></code>
<code><span>local res = bucketQueue.push(userRequest);</span></code>
<code><span>if (res == <span>false</span>) {</span></code>
<code><span> return 503; // bucket full</span></code>
<code><span>} else {</span></code>
<code><span> // request will be processed later</span></code>
<code><span>}</span></code>
<code><span>// consume the queue at fixed rate</span></code>
<code><span>setTimeout(function () {</span></code>
<code><span> local userRequest = bucketQueue.lpop();</span></code>
<code><span> // execute business logic</span></code>
<code><span>}, rate);</span>Token Bucket Algorithm
The token bucket is similar to the leaky bucket but separates token production from token consumption. Tokens are generated at a steady rate and stored in a bucket; a request proceeds only if it can acquire a token, ensuring a controlled request rate while allowing bursts up to the bucket capacity.
Token bucket pseudo‑code:
<span><span>// pseudo‑code implementation</span></span></code>
<code><span>local tokenProducerRate = <span>1</span>ms <span>// token generation rate</span></span></code>
<code><span>local bucketSize = <span>1000</span> <span>// max tokens</span></span></code>
<code><span>local bucketKey = <span>"bucketKey"</span>;</span></code>
<code><span>bucketKey.setSize(bucketSize);</span></code>
<code><span>local token = bucketKey.getValidToken();</span></code>
<code><span>if (token ~= <span>false</span>) {</span></code>
<code><span> token.lock(); // ensure exclusive use</span></code>
<code><span>} else {</span></code>
<code><span> return 503; // no token available</span></code>
<code><span>}</span></code>
<code><span>// continue business logic</span></code>
<code><span>token.destroy(); // release token after use</span></code>
<code><span>// token producer (adds a token every 1 ms)</span></code>
<code><span>setTimeout(function () {</span></code>
<code><span> local token = <span>new</span> Token();</span></code>
<code><span> local result = bucketKey.push(token);</span></code>
<code><span> // ignore push result; if bucket full, token is discarded</span></code>
<code><span>}, tokenProducerRate);</span>Algorithm Comparison
Each algorithm has its own strengths and trade‑offs; selecting the appropriate one depends on the specific business scenario, and hybrid approaches can be employed to satisfy complex requirements.
Note: This article focuses on the theoretical foundations and implementation sketches; future posts will cover practical rate‑limiting configurations in Nginx and Google Guava.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
