Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions
This article explains why rate limiting is essential for microservice stability, introduces basic concepts like thresholds and rejection strategies, and walks through multiple algorithms—including fixed‑window, sliding‑window, sliding‑log, leaky‑bucket, token‑bucket—and their Java implementations as well as Redis‑based distributed approaches, complete with code samples and performance considerations.
Introduction
With the rise of micro‑services, inter‑service dependencies have become stronger and call relationships more complex, making service stability increasingly important. When faced with sudden traffic spikes, malicious requests, or high request frequencies that pressure downstream services, techniques such as caching, circuit breaking, load balancing, and especially rate limiting are required to maintain stability.
1. Rate Limiting
Rate limiting restricts the number of requests or concurrent calls within a time window to protect system resources. Two key concepts are:
Threshold : the allowed request count per unit time (e.g., QPS = 10 means at most 10 requests per second).
Rejection strategy : actions taken when the threshold is exceeded, such as immediate rejection or queuing.
2. Fixed‑Window Algorithm
The fixed‑window (counter) algorithm uses an atomic counter to count requests within a 1‑second window. When the counter reaches the limit, the rejection strategy is triggered, and the counter resets every second.
2.1. Code Example
/**
* @author https://www.wdbyte.com
*/
public class RateLimiterSimpleWindow {
// Threshold
private static Integer QPS = 2;
// Time window (ms)
private static long TIME_WINDOWS = 1000;
// Counter
private static AtomicInteger REQ_COUNT = new AtomicInteger();
private static long START_TIME = System.currentTimeMillis();
public synchronized static boolean tryAcquire() {
if ((System.currentTimeMillis() - START_TIME) > TIME_WINDOWS) {
REQ_COUNT.set(0);
START_TIME = System.currentTimeMillis();
}
return REQ_COUNT.incrementAndGet() <= QPS;
}
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 10; i++) {
Thread.sleep(250);
LocalTime now = LocalTime.now();
if (!tryAcquire()) {
System.out.println(now + " 被限流");
} else {
System.out.println(now + " 做点什么");
}
}
}
}The output shows that with QPS = 2, roughly one request per second is rejected, but the fixed‑window suffers from boundary‑condition issues when requests span two adjacent windows.
3. Sliding‑Window Algorithm
The sliding‑window improves on the fixed‑window by moving the window forward in smaller steps (e.g., every 500 ms), reducing the chance of boundary spikes.
3.1. Code Example
package com.wdbyte.rate.limiter;
import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;
public class RateLimiterSlidingWindow {
private int qps = 2;
private long windowSize = 1000;
private Integer windowCount = 10;
private WindowInfo[] windowArray = new WindowInfo[windowCount];
public RateLimiterSlidingWindow(int qps) {
this.qps = qps;
long currentTimeMillis = System.currentTimeMillis();
for (int i = 0; i < windowArray.length; i++) {
windowArray[i] = new WindowInfo(currentTimeMillis, new AtomicInteger(0));
}
}
public synchronized boolean tryAcquire() {
long currentTimeMillis = System.currentTimeMillis();
int currentIndex = (int)(currentTimeMillis % windowSize / (windowSize / windowCount));
int sum = 0;
for (int i = 0; i < windowArray.length; i++) {
WindowInfo windowInfo = windowArray[i];
if ((currentTimeMillis - windowInfo.getTime()) > windowSize) {
windowInfo.getNumber().set(0);
windowInfo.setTime(currentTimeMillis);
}
if (currentIndex == i && windowInfo.getNumber().get() < qps) {
windowInfo.getNumber().incrementAndGet();
}
sum += windowInfo.getNumber().get();
}
return sum <= qps;
}
private class WindowInfo {
private Long time;
private AtomicInteger number;
public WindowInfo(long time, AtomicInteger number) {
this.time = time;
this.number = number;
}
public Long getTime() { return time; }
public void setTime(Long time) { this.time = time; }
public AtomicInteger getNumber() { return number; }
}
}4. Sliding‑Log Algorithm
This method records each request timestamp; a new request checks how many timestamps fall within the last second. It avoids window‑boundary problems but consumes more memory.
4.1. Code Example
package com.wdbyte.rate.limiter;
import java.time.LocalTime;
import java.util.HashSet;
import java.util.Set;
import java.util.TreeMap;
public class RateLimiterSlidingLog {
private Integer qps = 2;
private TreeMap<Long, Long> treeMap = new TreeMap<>();
private long clearTime = 60 * 1000;
public RateLimiterSlidingLog(Integer qps) { this.qps = qps; }
public synchronized boolean tryAcquire() {
long now = System.currentTimeMillis();
if (!treeMap.isEmpty() && (treeMap.firstKey() - now) > clearTime) {
Set<Long> keySet = new HashSet<>(treeMap.subMap(0L, now - 1000).keySet());
for (Long key : keySet) { treeMap.remove(key); }
}
int sum = 0;
for (Long value : treeMap.subMap(now - 1000, now).values()) { sum += value; }
if (sum + 1 > qps) { return false; }
treeMap.compute(now, (k, v) -> v == null ? 1L : v + 1);
return sum <= qps;
}
}5. Leaky‑Bucket Algorithm
The leaky‑bucket treats requests as water droplets entering a bucket; the bucket leaks at a constant rate (e.g., one token every 500 ms for QPS = 2). When the bucket overflows, excess requests are rejected.
6. Token‑Bucket Algorithm
Tokens are added to a bucket at a fixed rate (e.g., one token every 500 ms for QPS = 2). A request consumes a token; if none are available, the request is rejected. Google Guava’s RateLimiter implements this logic.
6.1. Code Example (Guava)
// qps 2
RateLimiter rateLimiter = RateLimiter.create(2);
for (int i = 0; i < 10; i++) {
String time = LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_TIME);
System.out.println(time + ":" + rateLimiter.tryAcquire());
Thread.sleep(250);
}6.2. Implementation Insight
Guava calculates token availability lazily during each acquire call, storing the next token generation time and avoiding a dedicated replenishment thread.
void resync(long nowMicros) {
if (nowMicros > this.nextFreeTicketMicros) {
double newPermits = (double)(nowMicros - this.nextFreeTicketMicros) / this.coolDownIntervalMicros();
this.storedPermits = Math.min(this.maxPermits, this.storedPermits + newPermits);
this.nextFreeTicketMicros = nowMicros;
}
}7. Redis Distributed Rate Limiting
Redis can be used for distributed rate limiting because it is fast, in‑memory, and single‑threaded.
7.1. Fixed‑Window with Redis
Using INCR and EXPIRE in an atomic Lua script to count requests per second.
local count = redis.call("incr", KEYS[1])
if count == 1 then
redis.call('expire', KEYS[1], ARGV[2])
end
if count > tonumber(ARGV[1]) then
return 0
end
return 17.2. Sliding‑Window with Redis ZSET
Store request timestamps in a sorted set; remove entries older than the window, count current entries, and decide whether to allow the request.
-- Remove old entries
redis.call('zremrangeByScore', KEYS[1], 0, ARGV[1])
-- Count current entries
local res = redis.call('zcard', KEYS[1])
if (res == nil) or (res < tonumber(ARGV[3])) then
redis.call('zadd', KEYS[1], ARGV[2], ARGV[4])
return 1
else
return 0
endBoth implementations demonstrate how to achieve per‑second QPS limits (e.g., QPS = 4) in a distributed environment.
8. Summary
The article compares window‑based and bucket‑based rate‑limiting algorithms. Window algorithms are simple and give clear QPS visibility but suffer from boundary spikes; bucket algorithms provide smoother traffic shaping—leaky bucket guarantees constant consumption, while token bucket handles bursts more gracefully. Single‑node implementations suit monolithic services, whereas Redis‑backed solutions enable distributed rate limiting. Additional tools such as Guava, Redisson, or Alibaba Sentinel can further simplify integration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
