Why Rate Limiting Matters: Strategies, Algorithms, and Real-World Implementations
Rate limiting protects services from overload by controlling traffic, using techniques such as circuit breaking, degradation, buffering, privilege handling, and algorithms like counters, leaky bucket, and token bucket, with implementations ranging from Guava in Java to Nginx+Lua for distributed systems.
Why Rate Limiting Is Needed
In everyday life and online services, uncontrolled traffic can cause congestion, accidents, or system crashes. Limiting traffic ensures that a system remains usable for a defined number of users while excess requests wait in a queue.
Rate‑Limiting Approaches
Circuit Breaking
When a service cannot recover quickly, a circuit breaker automatically rejects traffic to prevent overload. Once the service stabilizes, the breaker is closed and normal traffic resumes. Common tools include Hystrix and Alibaba Sentinel.
Service Degradation
Non‑critical features are temporarily disabled during spikes, freeing resources for core functions. For example, an e‑commerce site may suspend comments or loyalty points during a flash sale.
Buffering (Delay Handling)
Requests are placed in a buffer (e.g., a queue) and processed sequentially, reducing immediate load on the backend.
Privilege Handling
Users are classified, and high‑priority users receive service before others during congestion.
Difference Between Cache, Degradation, and Rate Limiting
Cache increases throughput and speeds up access.
Degradation temporarily disables failing components while providing fallback data.
Rate Limiting restricts request frequency when cache and degradation are insufficient, protecting the service before it becomes unavailable.
Rate‑Limiting Algorithms
Counter Algorithm
Simple counting of active resources (threads, DB connections, etc.) or requests within a time window. Example: allow at most 100 requests per minute.
Leaky Bucket Algorithm
Requests enter a bucket that leaks at a constant rate; excess requests overflow and are dropped, smoothing traffic bursts.
Token Bucket Algorithm
Tokens are added to a bucket at a steady rate; a request proceeds only if a token is available. This allows occasional bursts while still limiting overall rate.
Concurrency Limiting
Set a global QPS threshold; for example, Tomcat’s acceptCount, maxConnections, and maxThreads control connection and thread limits.
Limit total concurrency (e.g., DB connection pool, thread pool)
Limit instantaneous concurrency (e.g., Nginx limit_conn)
Limit average rate within a time window (e.g., Guava RateLimiter, Nginx limit_req)
Limit remote API call rate or MQ consumption rate
Limit based on network, CPU, or memory load
Interface Limiting
Two parts: a fixed‑window counter for total calls and a sliding‑window algorithm for finer‑grained control.
Fixed Window Issues
A fixed 1‑minute window can miss spikes that cross window boundaries, leading to inaccurate throttling.
Sliding Window
Divides the interval into smaller slots (e.g., milliseconds) for smoother, more precise rate limiting, at the cost of higher memory usage.
Implementation Examples
Guava (Java) Counter
LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.SECONDS)
.build(new CacheLoader<Long, AtomicLong>() {
@Override
public AtomicLong load(Long second) {
return new AtomicLong(0);
}
});
counter.get(1L).incrementAndGet();Guava Token Bucket (SmoothBursty)
RateLimiter limiter = RateLimiter.create(2); // 2 tokens per second
System.out.println(limiter.acquire());
Thread.sleep(2000);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());Guava Token Bucket (SmoothWarmingUp)
RateLimiter limiter = RateLimiter.create(2, 1000L, TimeUnit.MILLISECONDS);
System.out.println(limiter.acquire());
Thread.sleep(2000);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());Distributed Limiting with Nginx + Lua
local locks = require "resty.lock"
local function acquire()
local lock = locks:new("locks")
local elapsed, err = lock:lock("limit_key")
local limit_counter = ngx.shared.limit_counter
local key = "ip:" .. os.time()
local limit = 5
local current = limit_counter:get(key)
if current ~= nil and current + 1 > limit then
lock:unlock()
return 0
end
if current == nil then
limit_counter:set(key, 1, 1)
else
limit_counter:incr(key, 1)
end
lock:unlock()
return 1
end
ngx.print(acquire())These examples illustrate how rate limiting can be applied at various layers—from in‑process Java code to distributed Nginx/Lua scripts—to keep services responsive under heavy load.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
