Comprehensive Guide to Rate Limiting: Concepts, Common Algorithms, and Practical Implementation Strategies
This article explains the fundamental concepts of rate limiting, reviews popular algorithms such as token bucket, leaky bucket, and sliding window, and details practical implementation methods across single‑machine and distributed environments using tools like Guava, Nginx, Redis, and Sentinel.
Author: liuec1002 Source: blog.csdn.net/liuerchong/article/details/118882053
Rate Limiting Basic Concepts
Rate limiting typically involves two dimensions: time (a time window such as per second or per minute) and resources (e.g., maximum request count or concurrent connections). Combining these dimensions, rate limiting restricts resource access within a specific time window.
Time: limits based on a time window.
Resource: limits based on available resources such as request count or connection count.
In real scenarios multiple rules work together, e.g., per‑IP QPS <10, connections <5, per‑server QPS ≤1000, total connections ≤200, etc.
QPS and Connection Control
IP‑level and server‑level limits can be combined; a whole server group or data center can also be treated as a single entity for high‑level limits.
Transmission Rate
Different user groups may have different download speeds, e.g., 100 KB/s for regular users and 10 MB/s for premium users, implemented via user‑group based rate limiting.
Black/White Lists
Dynamic blacklists block abusive IPs, while whitelists grant privileged access to trusted accounts or services.
Distributed Environment
Two common distributed rate‑limiting approaches:
Gateway‑level limiting (apply rules at the entry point).
Middleware‑level limiting (store limits in a shared component such as Redis).
Sentinel (Spring Cloud Alibaba component) for distributed limiting and circuit breaking.
Common Rate‑Limiting Algorithms
Token Bucket Algorithm
The token bucket uses two key elements:
Token: a request must acquire a token to be processed.
Bucket: stores tokens with a fixed capacity.
Token Generation – Tokens are added to the bucket at a steady rate (e.g., 100 tokens per second). If the bucket is full, excess tokens are discarded.
Token Acquisition – A request proceeds only after obtaining a token. If tokens are exhausted, requests may be queued (optional buffer queue) or dropped.
Leaky Bucket Algorithm
Requests are placed into a bucket and are emitted at a constant rate, regardless of the incoming burst, ensuring a steady outflow.
Difference from Token Bucket – Token bucket allows bursts by pre‑storing tokens; leaky bucket smooths traffic by enforcing a fixed outflow rate.
Sliding Window
Counts requests in a moving time window, providing smoother limiting as the window length increases.
Typical Rate‑Limiting Solutions
Legality Verification
CAPTCHA, IP blacklists, etc., to block malicious traffic.
Guava RateLimiter
Guava provides RateLimiter for single‑machine limiting. Example:
RateLimiter limiter = RateLimiter.create(100); // 100 permits per secondIt cannot coordinate across multiple JVMs or servers.
Gateway‑Level Limiting
Using Nginx or Spring Cloud Gateway to filter traffic before it reaches backend services.
Rate control: limit_req_zone and burst .
Connection control: limit_conn_zone and limit_conn .
Middleware‑Level Limiting
Store counters in Redis (or Redis‑Cell) and use Lua scripts for atomic operations.
Sentinel
Alibaba's open‑source component integrated in Spring Cloud Alibaba, offering rich APIs and a visual console for rate limiting.
Architectural Considerations for Rate Limiting
In production, multiple limiting mechanisms are combined to achieve layered protection, from gateway to middleware to service‑level controls.
Concrete Implementation Techniques
Tomcat: configure maxThreads in conf/server.xml to limit concurrent requests.
Nginx: use limit_req_zone with burst for rate limiting.
Nginx: use limit_conn_zone and limit_conn for concurrent connection limiting.
Time‑window algorithm implemented with Redis sorted sets.
Leaky bucket via Redis‑Cell.
Token bucket via Guava RateLimiter .
Note: Redis‑based limits work in distributed systems, while Guava limits are limited to a single machine.
Tomcat Limiting Details
Set maxThreads in conf/server.xml . Default is 150; adjust based on server resources. Each thread consumes ~1 MB of JVM memory.
Operating system limits: Windows ~2000 threads per process, Linux ~1000 threads per process.
For more resources, see the linked articles and the author’s community groups.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.