Rate Limiting Strategies and Algorithms for High‑Concurrency Services
The article explains the need for rate limiting in high‑traffic scenarios, describes common algorithms such as token‑bucket and Guava RateLimiter, compares single‑node and distributed implementations, and outlines a Redis‑based solution for precise flow control across multiple service instances.
When a massive traffic event like a Double‑Eleven coupon grab occurs, the number of user requests can far exceed the service interface's capacity (e.g., 500 requests/s), leading to overload and dropped requests. To maintain service availability, the request rate must be limited.
What is rate limiting? It is the control of inbound and outbound traffic to prevent resource exhaustion and system instability. A rate‑limiting system typically provides two functions: a limiting strategy and a circuit‑breaker strategy, though this article focuses on the limiting strategy.
Rate limiting algorithms
Instant concurrency limit : Guava's RateLimiter implements token‑bucket algorithms, offering SmoothBursty and SmoothWarmingUp modes.
Window‑based request limit : Restricts the maximum number of requests within a time window (per second, minute, day). This can be implemented by storing counters with expiration, e.g., using Guava's cache with a 2‑second TTL.
Token bucket : If the configured average sending rate is r , a token is added to the bucket every 1/r seconds. The bucket can hold up to b tokens; excess tokens are discarded. Incoming traffic at rate v takes tokens; if a token is available, the request passes, otherwise it is rejected (circuit‑breaker logic).
Properties of the token‑bucket algorithm:
Long‑term throughput stabilizes at the token addition rate r .
It can absorb traffic bursts due to its buffer capacity.
Additional formulas: M (max possible transmission rate in bytes/s) > r ; T_max = b/(M‑r) is the maximum time the system can sustain the burst; B_max = T_max × M is the total data transferred during that period.
Advantages: Traffic is smoothed and the algorithm can tolerate bursty loads.
Implementation options
Guava's RateLimiter class (single‑node only; not suitable for distributed systems where total QPS = per‑node QPS × number of nodes).
Redis‑based solution: store two keys—one for timing and one for counting. Each request increments the counter; if the counter stays below the threshold within the timer window, the request is processed. Because the keys are globally unique, this approach works precisely across multiple instances.
The article references further reading on Redis‑based rate limiting designs and provides links to example code repositories.
References
Design of a Redis‑based rate limiting system: https://www.zybuluo.com/kay2/note/949160
Sample implementations: https://github.com/wukq/rate-limiter
Source: http://www.54tianzhisheng.cn/2017/11/18/flow-control/
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.