Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies
This article provides a comprehensive overview of rate limiting, covering basic concepts, common algorithms such as token bucket, leaky bucket, and sliding window, and practical implementation methods across Nginx, Tomcat, Guava, Redis, and Sentinel for distributed backend systems.
Article Directory
Basic concepts of rate limiting
Common algorithms used in rate limiting
Typical rate‑limiting solutions
Architectural considerations for rate limiting
Specific implementation techniques
Basic Concepts
Rate limiting usually involves two dimensions: time (a time window such as per second or per minute) and resources (maximum request count or concurrent connections). Combining these dimensions, a rule may limit, for example, 100 requests per second.
QPS and Connection Control
Limits can be applied per IP, per server, or per server group, allowing multiple rules to work together (e.g., IP QPS < 10, connections < 5, server QPS < 1000, connections < 200).
Transmission Rate
Different user groups may have different download speeds, e.g., regular users 100 KB/s, premium users 10 MB/s, which is another form of rate limiting.
Black/White List
IP addresses that exceed a threshold can be added to a blacklist ("blocking IP"), while trusted accounts can be placed on a whitelist to bypass limits.
Distributed Environment
In a distributed setup, the limit applies to the whole cluster, not just a single node. Centralized storage (e.g., Redis) is needed to share counters across nodes.
Common Algorithms
Token Bucket
The token bucket has two key elements: a bucket that holds tokens and a token generator that adds tokens at a fixed rate. A request can proceed only if it obtains a token; otherwise it is queued or dropped. Optional buffering queues can hold excess requests until new tokens appear.
Leaky Bucket
Requests are placed into a bucket and leak out at a constant rate, ensuring a steady output regardless of bursty input. If the bucket is full, new requests are discarded.
Sliding Window
A sliding window counts requests within the last *N* seconds; as the window moves forward, old counts expire, providing smoother throttling for varying traffic patterns.
Typical Rate‑Limiting Solutions
Legal Verification
CAPTCHA, IP blacklists, and other verification methods help block malicious traffic.
Guava RateLimiter
Guava provides RateLimiter for single‑machine throttling. Example: two servers Server 1 and Server 2 each limited to ≤10 RPS to keep the combined traffic ≤20 RPS.
Gateway‑Level Limiting
Gateways (e.g., Nginx, Spring Cloud Gateway, Zuul) act as the first entry point. Traffic is first filtered here before reaching backend services.
Nginx Rate Limiting
Use limit_req_zone to define a rate‑limit zone and limit_req to enforce it. Example configuration limits an IP to 2 requests per second (500 ms per request). The burst=4 parameter allows up to four bursty requests.
Concurrency limiting uses limit_conn_zone and limit_conn. Example: limit_conn perip 10 restricts a single IP to 10 simultaneous connections, while limit_conn perserver 100 caps total concurrent connections per server.
Note: The connection is counted only after the request header is processed by the backend.
Middleware Limiting
Store counters in a central component such as Redis. Redis' expiration and Lua scripting enable precise distributed throttling. Redis sorted sets can implement sliding windows; Redis‑Cell can realize leaky‑bucket behavior.
Sentinel
Alibaba's Sentinel offers rich APIs and a visual console for flow control, circuit breaking, and isolation.
Architectural Design Considerations
Real‑world projects often combine multiple techniques (gateway, middleware, component‑level limits) to achieve layered protection and optimal resource utilization.
Specific Implementation Techniques
Tomcat: configure maxThreads in conf/server.xml to limit concurrent requests; excess requests are queued.
Nginx: use limit_req_zone with burst for rate limiting and limit_conn_zone / limit_conn for concurrency control.
Redis: implement time‑window algorithms with sorted sets; use Redis‑Cell for leaky bucket; use Lua scripts for atomic counter updates.
Guava: apply RateLimiter for single‑node throttling.
Redis‑based limits work in distributed systems, while Guava limits are limited to a single JVM.
Tomcat Specifics
Default maxThreads is 150 (Tomcat 8.5). Increasing the value consumes more JVM memory (≈1 MB per thread) and raises GC pressure. OS limits also apply (≈2000 threads on Windows, ≈1000 on Linux).
Important Notes
Guava cannot coordinate across multiple machines.
Container‑level limits (Nginx, Tomcat) are simple but must meet business requirements.
For further reading, the original article is sourced from CSDN .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
