Backend Development 13 min read

Mastering Rate Limiting: Algorithms, Redis & Lua for Scalable Backend APIs

This article explains the concept of rate limiting, compares five common algorithms—including fixed‑window, sliding‑log, sliding‑window counter, token bucket and leaky bucket—covers distributed designs using Redis and Lua scripts, and provides complete FastAPI implementation and pytest tests for both fixed‑window and sliding‑log strategies.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Mastering Rate Limiting: Algorithms, Redis & Lua for Scalable Backend APIs

Rate limiting is a simple yet crucial concept in backend design patterns; it protects APIs, prevents abuse, and ensures fair resource usage, making a well‑designed rate limiter indispensable for any scalable system.

What is rate limiting?

Essentially, rate limiting controls the number of requests a user or client can make within a given time window.

Example use case:

Each user may send at most 100 requests per 15 minutes.

Rate limiting helps:

Protect backend systems from overload

Prevent denial‑of‑service (DoS) attacks

Ensure fair resource usage

Avoid accidental infinite loops or traffic spikes

Common rate limiting algorithms

Let’s look at four of the most common rate‑limiting algorithms, their use cases, and trade‑offs.

1. Fixed window counter

Maintain a counter per user that resets at the end of each time window (e.g., every 60 seconds).

<code>[时间窗口]
|------|------|------|
100 请求/窗口 ✅
在窗口边界时:计数器重置 ❌(可能导致流量突发)
</code>

Advantages: Simple to implement

Disadvantages: Allows traffic bursts at window boundaries (e.g., 100 requests at the end of one window and another 100 at the start of the next)

2. Sliding window log

Track each request’s timestamp in a list/queue and check whether the number of requests in the past N minutes exceeds the threshold.

<code>[滑动窗口]
|--- req1 --- req2 --- reqN ---|
保留时间戳,移除旧的时间戳
</code>

Advantages: Precise

Disadvantages: High memory usage and slower for high‑traffic systems

3. Sliding window counter (approximate)

Divide the window into sub‑windows (e.g., one‑minute buckets) and sum the recent buckets.

<code>[15分钟窗口]
= sum(最近15个1分钟桶)
</code>

Advantages: Memory‑efficient, smooth request limiting

Disadvantages: Slightly less accurate than the log‑based method

4. Token bucket

A bucket stores tokens; tokens are added at a fixed rate, and each request consumes one token.

<code>[桶]
随时间添加令牌
如果有令牌 -> 允许
否则 -> 拒绝
</code>

Advantages: Allows traffic bursts, more flexible than fixed windows

Disadvantages: Implementation is a bit more complex

5. Leaky bucket

Similar to the token bucket, but requests are queued and processed at a fixed rate.

Advantages: Guarantees a steady request rate

Disadvantages: Not ideal when traffic bursts are needed

Distributed rate limiter design

For distributed systems, local‑memory rate limiting (e.g., Flask or Django middleware) is insufficient; a centralized system is required. Below is a high‑level architecture.

Redis‑based token bucket

Tech stack:

Redis: centralized store with atomic operations

Lua scripts: atomic token check and update

Optional: Kafka or pub/sub for rate‑limit event tracking

<code># Pseudo‑code
key = f"rate_limit:{user_id}"
tokens = redis.get(key) or BUCKET_SIZE
if tokens > 0:
    redis.decr(key)
    process()
else:
    reject()
</code>

Implementation of rate limiting

We support two rate‑limiting strategies:

Fixed window rate limiting: Limit the number of requests within a fixed interval (e.g., max 5 requests per 10 seconds).

Sliding log rate limiting: Maintain a sliding window of request timestamps for more precise control (e.g., max 3 requests per 60 seconds).

We use Redis as the backend store and Lua scripts for atomic operations.

Fixed window rate limiter

The fixed‑window limiter allows a fixed number of requests within a specific time window (e.g., up to 5 requests every 10 seconds).

Lua script ensures atomic increment and expiration handling:

<code>fixed_rate_lua_script = ""
"local key = KEYS[1]
local window_size = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
-- Get current request count
local request_count = redis.call("GET", key)
if request_count then
    request_count = tonumber(request_count)
else
    request_count = 0
end
-- Check limit
if request_count >= limit then
    return 0  -- limit exceeded
end
-- Increment and set expiry if first request
request_count = redis.call("INCR", key)
if request_count == 1 then
    redis.call("EXPIRE", key, window_size)
end
return 1  -- request allowed
"
""
</code>

The script is executed in Python via the

eval

command:

<code>from fastapi import HTTPException, Request
from .config import redis_client
from .lua_scripts import fixed_rate_lua_script
import time

def fixed_rate_limit_with_lua_script(request: Request):
    client_ip = request.client.host
    current_time = int(time.time())
    window_size = 10  # 10 seconds
    limit = 5          # 5 requests
    key = f"fixed_rate_limit:{client_ip}:{current_time // window_size}"
    result = redis_client.eval(fixed_rate_lua_script, 1, key, window_size, limit)
    if result == 0:
        raise HTTPException(status_code=429, detail="Rate limit exceeded. Try again later.")
</code>

Sliding log rate limiter

The sliding‑log limiter maintains a timestamp‑sorted set to control requests more precisely.

Lua script removes expired timestamps, checks the current count, and adds a new timestamp:

<code>sliding_log_lua_script = ""
"local key = KEYS[1]
local current_time = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
-- Remove timestamps older than the window
redis.call("ZREMRANGEBYSCORE", key, 0, current_time - window_size)
-- Check request count
local request_count = redis.call("ZCARD", key)
if request_count >= limit then
    return 0  -- limit exceeded
end
-- Add current timestamp
redis.call("ZADD", key, current_time, current_time)
-- Set expiration
redis.call("EXPIRE", key, window_size)
return 1  -- request allowed
"
""
</code>

Execution in Python mirrors the fixed‑window approach:

<code>from fastapi import HTTPException, Request
from .config import redis_client
from .lua_scripts import sliding_log_lua_script
import time

def sliding_log_rate_limit_with_lua_script(request: Request):
    client_ip = request.client.host
    current_time = int(time.time())
    window_size = 60  # 60 seconds
    limit = 3         # 3 requests
    key = f"sliding_log_rate_limit:{client_ip}"
    result = redis_client.eval(sliding_log_lua_script, 1, key, current_time, window_size, limit)
    if result == 0:
        raise HTTPException(status_code=429, detail="Rate limit exceeded. Try again later.")
</code>

An alternative implementation can use direct Redis commands; switch dependencies in

main.py

to toggle between the Lua‑based and direct client versions.

<code>from ..dependencies import fixed_rate_limit_with_lua_script, sliding_log_rate_limit_with_lua_script

@app.get("/fixed-limited", dependencies=[Depends(fixed_rate_limit_with_lua_script)])
async def get_fixed_limited_resource():
    return {"message": "You accessed a fixed‑window limited resource!"}
</code>

Testing the rate limiter

We use

pytest

to test the limiter, clearing the Redis database before each test to ensure isolation.

<code>import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.config import redis_client  # your Redis client

client = TestClient(app)

@pytest.fixture(autouse=True)
def clear_redis():
    """Clear Redis keys before each test"""
    print("Clearing Redis database...")
    redis_client.flushdb()

def test_read_root():
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the rate limiter demo!"}

def test_fixed_limited_resource():
    response = client.get("/fixed-limited")
    assert response.status_code == 200
    assert response.json() == {"message": "You accessed a fixed‑window limited resource!"}

def test_sliding_limited_resource():
    response = client.get("/sliding-limited")
    assert response.status_code == 200
    assert response.json() == {"message": "You accessed a sliding‑log limited resource!"}

def test_fixed_limited_exceed_limit():
    for _ in range(5):
        client.get("/fixed-limited")
    response = client.get("/fixed-limited")
    assert response.status_code == 429  # Too Many Requests

def test_sliding_limited_exceed_limit():
    for _ in range(3):
        client.get("/sliding-limited")
    response = client.get("/sliding-limited")
    assert response.status_code == 429
</code>

Conclusion

From protecting your API to preventing abuse and ensuring fair usage, a well‑designed rate limiter is an essential component of any scalable system.

backendalgorithmRedisrate limitingluafastapi
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.