Rate Limiting Algorithms and Implementations in Java Microservices
The article explains service rate limiting and demonstrates six Java implementations—Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, Sentinel middleware, and Spring Cloud Gateway—detailing their algorithms, code examples, and configuration to protect microservices from overload.
This article introduces the concept of service rate limiting, which controls request rates to protect services from overload. It covers six common algorithms—Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, middleware‑based limiting (Sentinel), and gateway limiting (Spring Cloud Gateway)—and provides practical Java implementations.
Fixed Window Algorithm
The fixed window maintains a counter for a defined time slot and rejects requests once the limit is reached. The implementation uses a window size (ms) and a maximum request count.
@Slf4j
public class FixedWindowRateLimiter {
// time window size in ms
private long windowSize;
// allowed request count per window
private int maxRequestCount;
// current count
private AtomicInteger count = new AtomicInteger(0);
// window right border
private long windowBorder;
public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
this.windowSize = windowSize;
this.maxRequestCount = maxRequestCount;
windowBorder = System.currentTimeMillis() + windowSize;
}
public synchronized boolean tryAcquire() {
long currentTime = System.currentTimeMillis();
if (windowBorder < currentTime) {
log.info("window reset");
do {
windowBorder += windowSize;
} while (windowBorder < currentTime);
count = new AtomicInteger(0);
}
if (count.intValue() < maxRequestCount) {
count.incrementAndGet();
log.info("tryAcquire success");
return true;
} else {
log.info("tryAcquire fail");
return false;
}
}
}Test code shows that 5 requests are allowed per 1000 ms.
void test() throws InterruptedException {
FixedWindowRateLimiter limiter = new FixedWindowRateLimiter(1000, 5);
for (int i = 0; i < 10; i++) {
if (limiter.tryAcquire()) {
System.out.println("执行任务");
} else {
System.out.println("被限流");
TimeUnit.MILLISECONDS.sleep(300);
}
}
}The fixed window is simple but can cause traffic spikes at window boundaries.
Sliding Window Algorithm
Sliding window refines the fixed window by dividing the interval into smaller shards and sliding one shard at a time. It keeps an array of counters for each shard.
@Slf4j
public class SlidingWindowRateLimiter {
private long windowSize;
private int shardNum;
private int maxRequestCount;
private int[] shardRequestCount;
private int totalCount;
private int shardId;
private long tinyWindowSize;
private long windowBorder;
public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
this.windowSize = windowSize;
this.shardNum = shardNum;
this.maxRequestCount = maxRequestCount;
shardRequestCount = new int[shardNum];
tinyWindowSize = windowSize / shardNum;
windowBorder = System.currentTimeMillis();
}
public synchronized boolean tryAcquire() {
long currentTime = System.currentTimeMillis();
if (currentTime > windowBorder) {
do {
shardId = (++shardId) % shardNum;
totalCount -= shardRequestCount[shardId];
shardRequestCount[shardId] = 0;
windowBorder += tinyWindowSize;
} while (windowBorder < currentTime);
}
if (totalCount < maxRequestCount) {
log.info("tryAcquire success,{}", shardId);
shardRequestCount[shardId]++;
totalCount++;
return true;
} else {
log.info("tryAcquire fail,{}", shardId);
return false;
}
}
}Testing with a 1‑second window split into 10 shards (100 ms each) demonstrates smoother request distribution.
void test() throws InterruptedException {
SlidingWindowRateLimiter limiter = new SlidingWindowRateLimiter(1000, 10, 10);
TimeUnit.MILLISECONDS.sleep(800);
for (int i = 0; i < 15; i++) {
boolean acquire = limiter.tryAcquire();
System.out.println(acquire ? "执行任务" : "被限流");
TimeUnit.MILLISECONDS.sleep(10);
}
}Leaky Bucket Algorithm
The leaky bucket models a bucket that leaks at a constant rate. Requests fill the bucket; excess requests are dropped.
@Slf4j
public class LeakyBucketRateLimiter {
private int capacity;
private AtomicInteger water = new AtomicInteger(0);
private long leakTimeStamp;
private int leakRate;
public LeakyBucketRateLimiter(int capacity, int leakRate) {
this.capacity = capacity;
this.leakRate = leakRate;
}
public synchronized boolean tryAcquire() {
if (water.get() == 0) {
log.info("start leaking");
leakTimeStamp = System.currentTimeMillis();
water.incrementAndGet();
return water.get() < capacity;
}
long currentTime = System.currentTimeMillis();
int leakedWater = (int) ((currentTime - leakTimeStamp) / 1000 * leakRate);
if (leakedWater != 0) {
int leftWater = water.get() - leakedWater;
water.set(Math.max(0, leftWater));
leakTimeStamp = System.currentTimeMillis();
}
if (water.get() < capacity) {
log.info("tryAcquire success");
water.incrementAndGet();
return true;
} else {
log.info("tryAcquire fail");
return false;
}
}
}Test code with capacity 3 and leak rate 1 req/s shows alternating success and throttling.
void test() throws InterruptedException {
LeakyBucketRateLimiter limiter = new LeakyBucketRateLimiter(3, 1);
for (int i = 0; i < 15; i++) {
if (limiter.tryAcquire()) {
System.out.println("执行任务");
} else {
System.out.println("被限流");
}
TimeUnit.MILLISECONDS.sleep(500);
}
}Token Bucket Algorithm
Token bucket improves leaky bucket by allowing bursts. Tokens are generated at a steady rate and stored up to a maximum. Guava’s RateLimiter implements this.
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>Example that creates 5 tokens per second:
void acquireTest() {
RateLimiter rateLimiter = RateLimiter.create(5);
for (int i = 0; i < 10; i++) {
double wait = rateLimiter.acquire();
log.info("等待时间:{}s", wait);
}
}Guava also supports pre‑acquire (multiple permits) and warm‑up periods, which help handle sudden bursts without long delays.
void acquireMultiTest() {
RateLimiter limiter = RateLimiter.create(1);
for (int i = 0; i < 3; i++) {
int num = 2 * i + 1;
log.info("获取{}个令牌", num);
double cost = limiter.acquire(num);
log.info("获取{}个令牌结束,耗时{}ms", num, cost);
}
}Warm‑up example:
void acquireSmoothly() {
RateLimiter limiter = RateLimiter.create(5, 3, TimeUnit.SECONDS);
long start = System.currentTimeMillis();
for (int i = 0; i < 15; i++) {
double time = limiter.acquire();
log.info("等待时间:{}s, 总时间:{}ms", time, System.currentTimeMillis() - start);
}
}Middleware Rate Limiting – Sentinel
For distributed microservices, Sentinel (Spring Cloud Alibaba) provides cluster‑wide flow control. Annotate service methods with @SentinelResource and define a block handler.
@Service
public class QueryService {
public static final String KEY = "query";
@SentinelResource(value = KEY, blockHandler = "blockHandlerMethod")
public String query(String name) {
return "begin query,name=" + name;
}
public String blockHandlerMethod(String name, BlockException e) {
e.printStackTrace();
return "blockHandlerMethod for Query : " + name;
}
}Rules are loaded programmatically (e.g., QPS = 1) and can be visualized in the Sentinel dashboard.
@Component
public class SentinelConfig {
@PostConstruct
private void init() {
List
rules = new ArrayList<>();
FlowRule rule = new FlowRule(QueryService.KEY);
rule.setCount(1);
rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
rule.setLimitApp("default");
rules.add(rule);
FlowRuleManager.loadRules(rules);
}
}Configuration in application.yml sets the Sentinel port and dashboard address.
spring:
application:
name: sentinel-test
cloud:
sentinel:
transport:
port: 8719
dashboard: localhost:8088Gateway Rate Limiting – Spring Cloud Gateway
Gateway uses Redis + Lua to implement a token‑bucket limiter. The configuration defines replenish rate, burst capacity, and a key resolver (e.g., request path).
spring:
cloud:
gateway:
routes:
- id: limit_route
uri: lb://sentinel-test
predicates:
- Path=/sentinel-test/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 1
redis-rate-limiter.burstCapacity: 2
key-resolver: "#{@pathKeyResolver}"
redis:
host: 127.0.0.1
port: 6379Key resolver example:
@Component
public class PathKeyResolver implements KeyResolver {
public Mono
resolve(ServerWebExchange exchange) {
String path = exchange.getRequest().getPath().toString();
log.info("Request path: {}", path);
return Mono.just(path);
}
}When the QPS limit is exceeded, the gateway returns HTTP 429. The same mechanism can be extended to other dimensions such as headers or query parameters.
Summary
Rate limiting is essential for protecting services against traffic spikes. Different algorithms trade off simplicity, precision, and resource consumption. Combining rate limiting with circuit breaking and degradation yields a robust strategy for high‑traffic microservice systems.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.