Master Rate Limiting: Token & Leaky Buckets, Distributed Strategies
This article explains how caching, degradation, and especially rate limiting—using token bucket, leaky bucket, and counter‑based methods—protect high‑concurrency systems, covering algorithm basics, application‑level techniques, and distributed implementations with Redis+Lua and Nginx+Lua.
Rate Limiting Algorithms
Common rate‑limiting algorithms include the token bucket and leaky bucket; a simple counter can also be used for coarse limiting.
Token Bucket Algorithm
The token bucket stores a fixed number of tokens that are added at a constant rate. When a request arrives, it consumes tokens; if insufficient tokens exist, the request is rejected or delayed.
Assume a limit of 2 r/s, adding tokens every 500 ms.
The bucket can hold up to b tokens; excess tokens are discarded.
When a packet of n bytes arrives, n tokens are removed and the packet is sent.
If the bucket has fewer than n tokens, the packet is throttled (dropped or queued).
Leaky Bucket Algorithm
The leaky bucket acts as a meter for traffic shaping and policing. It has a fixed capacity and leaks tokens at a constant rate; incoming bursts are either stored (if space permits) or discarded.
A fixed‑capacity bucket releases tokens at a constant rate.
If the bucket is empty, no tokens are released.
Tokens can be added at any rate.
Overflowing tokens are discarded.
Comparison:
Token bucket adds tokens at a fixed rate and allows bursts while tokens are available.
Leaky bucket releases tokens at a fixed rate, smoothing incoming bursts.
Token bucket limits average inbound rate and permits bursts; leaky bucket limits output rate to smooth traffic.
Simple counter‑based limiting can restrict total concurrency (e.g., database connections, thread pools) by checking a global request count against a threshold.
Application‑Level Rate Limiting
Typical limits include total concurrent connections, total resource count, and per‑API request limits. For example, Tomcat’s acceptCount, maxConnections, and maxThreads control connection queues, maximum simultaneous connections, and thread pool size. Similar settings exist for MySQL ( max_connections) and Redis ( tcp-backlog).
Java code using AtomicLong can implement simple total‑concurrency limiting:
try {
if (atomic.incrementAndGet() > limit) {
// reject request
}
// process request
} finally {
atomic.decrementAndGet();
}Guava’s RateLimiter provides token‑bucket implementations for smooth bursty limiting ( SmoothBursty) and warm‑up limiting ( SmoothWarmingUp).
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire()); // repeat as neededUsing RateLimiter.acquire() consumes tokens; if none are available, the call blocks until a token is added, effectively smoothing request rates.
Distributed Rate Limiting
Distributed limiting requires atomic operations across nodes. Common solutions use Redis+Lua or Nginx+Lua.
Redis+Lua Example
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local current = tonumber(redis.call('get', key) or '0')
if current + 1 > limit then
return 0
else
redis.call('INCRBY', key, '1')
redis.call('expire', key, '2')
return 1
endJava integration evaluates the script via Jedis:
public static boolean acquire() throws Exception {
String luaScript = Files.readString(Paths.get("limit.lua"));
Jedis jedis = new Jedis("192.168.147.52", 6379);
String key = "ip:" + System.currentTimeMillis() / 1000;
String limit = "3";
return (Long) jedis.eval(luaScript, Collections.singletonList(key), Collections.singletonList(limit)) == 1;
}Nginx+Lua Example
local locks = require "resty.lock"
local function acquire()
local lock = locks:new("locks")
lock:lock("limit_key")
local limit_counter = ngx.shared.limit_counter
local key = "ip:" .. os.time()
local limit = 5
local current = limit_counter:get(key)
if current and current + 1 > limit then
lock:unlock()
return 0
end
if not current then
limit_counter:set(key, 1, 1)
else
limit_counter:incr(key, 1)
end
lock:unlock()
return 1
end
ngx.print(acquire())Both approaches ensure atomicity (Redis Lua runs single‑threaded; Nginx Lua uses resty.lock and shared dictionaries).
When traffic is extremely high, consider sharding the distributed limiter via consistent hashing or falling back to application‑level limiting. Large e‑commerce platforms (e.g., JD.com) successfully use Redis+Lua for flash‑sale throttling.
For entry‑point limiting, it is recommended to handle it at the access layer (e.g., Nginx) rather than within the application.
Reference Materials:
https://en.wikipedia.org/wiki/Token_bucket
https://en.wikipedia.org/wiki/Leaky_bucket
http://redis.io/commands/incr
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html
http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html
https://github.com/openresty/lua-resty-limit-traffic
http://nginx.org/en/docs/http/ngx_http_core_module.html#limit_rate
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
