Mastering Rate Limiting: Algorithms, Redis & Lua for Scalable Backend APIs
This article explains the concept of rate limiting, compares five common algorithms—including fixed‑window, sliding‑log, sliding‑window counter, token bucket and leaky bucket—covers distributed designs using Redis and Lua scripts, and provides complete FastAPI implementation and pytest tests for both fixed‑window and sliding‑log strategies.
Rate limiting is a simple yet crucial concept in backend design patterns; it protects APIs, prevents abuse, and ensures fair resource usage, making a well‑designed rate limiter indispensable for any scalable system.
What is rate limiting?
Essentially, rate limiting controls the number of requests a user or client can make within a given time window.
Example use case:
Each user may send at most 100 requests per 15 minutes.
Rate limiting helps:
Protect backend systems from overload
Prevent denial‑of‑service (DoS) attacks
Ensure fair resource usage
Avoid accidental infinite loops or traffic spikes
Common rate limiting algorithms
Let’s look at four of the most common rate‑limiting algorithms, their use cases, and trade‑offs.
1. Fixed window counter
Maintain a counter per user that resets at the end of each time window (e.g., every 60 seconds).
<code>[时间窗口]
|------|------|------|
100 请求/窗口 ✅
在窗口边界时:计数器重置 ❌(可能导致流量突发)
</code>Advantages: Simple to implement
Disadvantages: Allows traffic bursts at window boundaries (e.g., 100 requests at the end of one window and another 100 at the start of the next)
2. Sliding window log
Track each request’s timestamp in a list/queue and check whether the number of requests in the past N minutes exceeds the threshold.
<code>[滑动窗口]
|--- req1 --- req2 --- reqN ---|
保留时间戳,移除旧的时间戳
</code>Advantages: Precise
Disadvantages: High memory usage and slower for high‑traffic systems
3. Sliding window counter (approximate)
Divide the window into sub‑windows (e.g., one‑minute buckets) and sum the recent buckets.
<code>[15分钟窗口]
= sum(最近15个1分钟桶)
</code>Advantages: Memory‑efficient, smooth request limiting
Disadvantages: Slightly less accurate than the log‑based method
4. Token bucket
A bucket stores tokens; tokens are added at a fixed rate, and each request consumes one token.
<code>[桶]
随时间添加令牌
如果有令牌 -> 允许
否则 -> 拒绝
</code>Advantages: Allows traffic bursts, more flexible than fixed windows
Disadvantages: Implementation is a bit more complex
5. Leaky bucket
Similar to the token bucket, but requests are queued and processed at a fixed rate.
Advantages: Guarantees a steady request rate
Disadvantages: Not ideal when traffic bursts are needed
Distributed rate limiter design
For distributed systems, local‑memory rate limiting (e.g., Flask or Django middleware) is insufficient; a centralized system is required. Below is a high‑level architecture.
Redis‑based token bucket
Tech stack:
Redis: centralized store with atomic operations
Lua scripts: atomic token check and update
Optional: Kafka or pub/sub for rate‑limit event tracking
<code># Pseudo‑code
key = f"rate_limit:{user_id}"
tokens = redis.get(key) or BUCKET_SIZE
if tokens > 0:
redis.decr(key)
process()
else:
reject()
</code>Implementation of rate limiting
We support two rate‑limiting strategies:
Fixed window rate limiting: Limit the number of requests within a fixed interval (e.g., max 5 requests per 10 seconds).
Sliding log rate limiting: Maintain a sliding window of request timestamps for more precise control (e.g., max 3 requests per 60 seconds).
We use Redis as the backend store and Lua scripts for atomic operations.
Fixed window rate limiter
The fixed‑window limiter allows a fixed number of requests within a specific time window (e.g., up to 5 requests every 10 seconds).
Lua script ensures atomic increment and expiration handling:
<code>fixed_rate_lua_script = ""
"local key = KEYS[1]
local window_size = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
-- Get current request count
local request_count = redis.call("GET", key)
if request_count then
request_count = tonumber(request_count)
else
request_count = 0
end
-- Check limit
if request_count >= limit then
return 0 -- limit exceeded
end
-- Increment and set expiry if first request
request_count = redis.call("INCR", key)
if request_count == 1 then
redis.call("EXPIRE", key, window_size)
end
return 1 -- request allowed
"
""
</code>The script is executed in Python via the
evalcommand:
<code>from fastapi import HTTPException, Request
from .config import redis_client
from .lua_scripts import fixed_rate_lua_script
import time
def fixed_rate_limit_with_lua_script(request: Request):
client_ip = request.client.host
current_time = int(time.time())
window_size = 10 # 10 seconds
limit = 5 # 5 requests
key = f"fixed_rate_limit:{client_ip}:{current_time // window_size}"
result = redis_client.eval(fixed_rate_lua_script, 1, key, window_size, limit)
if result == 0:
raise HTTPException(status_code=429, detail="Rate limit exceeded. Try again later.")
</code>Sliding log rate limiter
The sliding‑log limiter maintains a timestamp‑sorted set to control requests more precisely.
Lua script removes expired timestamps, checks the current count, and adds a new timestamp:
<code>sliding_log_lua_script = ""
"local key = KEYS[1]
local current_time = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
-- Remove timestamps older than the window
redis.call("ZREMRANGEBYSCORE", key, 0, current_time - window_size)
-- Check request count
local request_count = redis.call("ZCARD", key)
if request_count >= limit then
return 0 -- limit exceeded
end
-- Add current timestamp
redis.call("ZADD", key, current_time, current_time)
-- Set expiration
redis.call("EXPIRE", key, window_size)
return 1 -- request allowed
"
""
</code>Execution in Python mirrors the fixed‑window approach:
<code>from fastapi import HTTPException, Request
from .config import redis_client
from .lua_scripts import sliding_log_lua_script
import time
def sliding_log_rate_limit_with_lua_script(request: Request):
client_ip = request.client.host
current_time = int(time.time())
window_size = 60 # 60 seconds
limit = 3 # 3 requests
key = f"sliding_log_rate_limit:{client_ip}"
result = redis_client.eval(sliding_log_lua_script, 1, key, current_time, window_size, limit)
if result == 0:
raise HTTPException(status_code=429, detail="Rate limit exceeded. Try again later.")
</code>An alternative implementation can use direct Redis commands; switch dependencies in
main.pyto toggle between the Lua‑based and direct client versions.
<code>from ..dependencies import fixed_rate_limit_with_lua_script, sliding_log_rate_limit_with_lua_script
@app.get("/fixed-limited", dependencies=[Depends(fixed_rate_limit_with_lua_script)])
async def get_fixed_limited_resource():
return {"message": "You accessed a fixed‑window limited resource!"}
</code>Testing the rate limiter
We use
pytestto test the limiter, clearing the Redis database before each test to ensure isolation.
<code>import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.config import redis_client # your Redis client
client = TestClient(app)
@pytest.fixture(autouse=True)
def clear_redis():
"""Clear Redis keys before each test"""
print("Clearing Redis database...")
redis_client.flushdb()
def test_read_root():
response = client.get("/")
assert response.status_code == 200
assert response.json() == {"message": "Welcome to the rate limiter demo!"}
def test_fixed_limited_resource():
response = client.get("/fixed-limited")
assert response.status_code == 200
assert response.json() == {"message": "You accessed a fixed‑window limited resource!"}
def test_sliding_limited_resource():
response = client.get("/sliding-limited")
assert response.status_code == 200
assert response.json() == {"message": "You accessed a sliding‑log limited resource!"}
def test_fixed_limited_exceed_limit():
for _ in range(5):
client.get("/fixed-limited")
response = client.get("/fixed-limited")
assert response.status_code == 429 # Too Many Requests
def test_sliding_limited_exceed_limit():
for _ in range(3):
client.get("/sliding-limited")
response = client.get("/sliding-limited")
assert response.status_code == 429
</code>Conclusion
From protecting your API to preventing abuse and ensuring fair usage, a well‑designed rate limiter is an essential component of any scalable system.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.