Designing a Million‑QPS Rate Limiter for Backend System Interviews
This article walks through a complete, interview‑ready design of a high‑performance rate‑limiting system that can handle up to one million queries per second, covering requirements, core entities, algorithm choices, distributed state storage with Redis, scalability, high availability, latency optimization, hot‑key mitigation, and dynamic rule configuration.
1. What Is a Rate Limiter
A rate limiter is a mechanism that controls request traffic in high‑concurrency systems, preventing overload by limiting the number of requests a user or service can make within a given time window (e.g., 100 requests per minute).
2. Requirement Analysis
Interviewer: “Let’s discuss system design. How would you implement a high‑performance rate limiter?”
2.1 Functional Requirements
2.1.1 Core Requirements
The system must identify clients by user ID, IP address, or API key to apply appropriate limits.
Configurable rules should limit HTTP requests (e.g., 100 API calls per minute per user).
When the limit is exceeded, the system must reject the request with HTTP 429 and include useful headers such as remaining quota and reset time.
2.2 Non‑functional Requirements
Assume the target scale is 100 million daily active users generating 1 million QPS.
2.2.1 Core Non‑functional Requirements
Low latency : the rate‑limit check must add less than 10 ms overhead per request.
High availability : the limiter is a core infrastructure component and may tolerate eventual consistency.
High concurrency : the system must reliably handle 1 M QPS.
3. Low‑level Design
Define core entities and interfaces first.
3.1 Core Entity Definitions
Rule : defines a rate‑limit policy for a specific scenario (e.g., 1000 requests per hour for authenticated users).
Client : the entity being limited (user, IP, or API key) with an associated rate‑limit state.
Request : an incoming API call carrying context such as client identity, endpoint, and timestamp.
When a request arrives, the system identifies the client, looks up the applicable rule, checks the current usage, and decides to allow or reject.
3.2 System Interface Design
/**
* Check whether a request is allowed
* @param clientId unique client identifier (user ID, IP, or API key)
* @param ruleId unique rule identifier
* @return object containing pass flag, remaining count and reset time
*/
isRequestAllowed(clientId, ruleId) -> {
passes: boolean, // whether allowed
remaining: number, // remaining requests
resetTime: timestamp // counter reset timestamp
}4. High‑level Architecture
4.1 Where to Deploy the Rate Limiter?
Interviewer: “Where would you place the limiter in the architecture?”
4.1.1 In‑process Limiting
Each service instance keeps its own counter in memory. This is fast but the state is not shared, leading to inaccurate global limits.
4.1.2 Dedicated Service Limiting
A separate microservice holds a centralized counter. It provides accurate limits but adds network latency and introduces a single‑point‑of‑failure risk.
4.1.3 API Gateway / Load Balancer Layer
Preferred solution: embed the limiter at the entry point (API gateway or load balancer) so every external request is checked before reaching any backend service.
4.2 How to Identify Clients?
Typical identifiers:
User ID : extracted from JWT or session token.
IP address : taken from X‑Forwarded‑For header.
API key : provided via X‑API‑Key header.
In practice, multiple identifiers can be combined to enforce layered rules.
4.3 Rate‑Limiting Algorithm Choice
4.3.1 Fixed Window Counter
Simple hash table mapping client‑window to a count. Easy to implement but suffers from boundary spikes.
{
"alice:12:00:00": 100,
"alice:12:01:00": 5,
"bob:12:00:00": 20
}4.3.2 Sliding Window Counter
Divides a large window into smaller sub‑windows to smooth spikes. Improves accuracy but still has granularity limits.
4.3.3 Leaky Bucket
Requests fill a bucket that leaks at a fixed rate; excess requests are dropped. Limited flexibility for dynamic traffic.
4.3.4 Token Bucket (Recommended)
Each client has a bucket with a capacity and a refill rate. Tokens are consumed per request; if none are available, the request is rejected. Balances burst handling and steady‑state rate control.
4.4 Storing Token‑Bucket State
Use Redis as a high‑performance, in‑memory store accessible by all gateway instances. Perform atomic read‑modify‑write using a Lua script to avoid race conditions.
MULTI
HSET user:bucket tokens <new_token_count>
HSET user:bucket last_refill <current_timestamp>
EXPIRE user:bucket 3600
EXEC4.5 High Availability of Redis
Deploy a Redis Cluster with sharding (16384 hash slots) and master‑slave replication. Sentinel or built‑in cluster failover ensures continuity when a node crashes.
4.6 Minimizing Latency
Maintain a connection pool between the gateway and Redis to avoid per‑request TCP handshakes.
Co‑locate gateways and Redis nodes in the same data center or availability zone to keep network round‑trip time minimal.
4.7 Hot‑key Mitigation
Detect abusive clients that repeatedly trigger limits and automatically blacklist them, or provide higher‑tier packages with dedicated resources for legitimate high‑traffic users.
4.8 Dynamic Rule Configuration
Store rate‑limit rules in an external configuration center (e.g., Nacos, Apollo) and let gateways either poll (pull) the config at regular intervals or receive push notifications via a long‑lived connection. Pull is simple; push offers real‑time updates.
5. Summary
The article presents a systematic, interview‑ready design of a million‑QPS rate limiter, covering requirement gathering, entity modeling, algorithm selection, distributed state management with Redis, scalability through sharding, high availability, latency optimization, hot‑key handling, and dynamic configuration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
