Six Practical Rate‑Limiting Techniques for Microservices

The article walks through six concrete rate‑limiting methods—fixed window, sliding window, leaky bucket, token bucket (Guava RateLimiter), Sentinel middleware, and Spring Cloud Gateway—showing Java implementations, test results, advantages, drawbacks, and how they fit into microservice architectures.

Programmer XiaoFu
Programmer XiaoFu
Programmer XiaoFu
Six Practical Rate‑Limiting Techniques for Microservices

Service rate limiting protects microservices by controlling request frequency. This guide presents six common algorithms, their Java implementations, test cases, and practical trade‑offs, followed by middleware and gateway solutions for distributed environments.

Fixed Window Algorithm

The fixed‑window approach keeps a counter for a predefined time slice and rejects requests once the counter exceeds the allowed limit. Implementation parameters include window size (milliseconds) and maximum request count.

@Slf4j
public class FixedWindowRateLimiter {
    // window size in ms
    private long windowSize;
    // allowed requests per window
    private int maxRequestCount;
    // current count
    private AtomicInteger count = new AtomicInteger(0);
    // window right boundary timestamp
    private long windowBorder;

    public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
        this.windowSize = windowSize;
        this.maxRequestCount = maxRequestCount;
        windowBorder = System.currentTimeMillis() + windowSize;
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            log.info("window reset");
            do {
                windowBorder += windowSize;
            } while (windowBorder < currentTime);
            count = new AtomicInteger(0);
        }
        if (count.intValue() < maxRequestCount) {
            count.incrementAndGet();
            log.info("tryAcquire success");
            return true;
        } else {
            log.info("tryAcquire fail");
            return false;
        }
    }
}

Test: allow 5 requests per 1000 ms. The result shows the first five calls succeed, the next ones are throttled until the window slides.

Pros: simple to implement. Cons: cannot handle burst traffic that spans window boundaries, leading to sudden spikes that may overload the backend.

Sliding Window Algorithm

Sliding windows divide the fixed interval into smaller shards. Each shard maintains its own counter; when the window slides, the oldest shard is cleared and a new one is added. The sum of all shard counters determines whether to throttle.

@Slf4j
public class SlidingWindowRateLimiter {
    private long windowSize;          // total window size (ms)
    private int shardNum;             // number of shards
    private int maxRequestCount;      // allowed requests per total window
    private int[] shardRequestCount;  // counters per shard
    private int totalCount;           // sum of all shards
    private int shardId;              // current shard index
    private long tinyWindowSize;      // size of a shard (ms)
    private long windowBorder;       // right boundary of the current window

    public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
        this.windowSize = windowSize;
        this.shardNum = shardNum;
        this.maxRequestCount = maxRequestCount;
        shardRequestCount = new int[shardNum];
        tinyWindowSize = windowSize / shardNum;
        windowBorder = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (currentTime > windowBorder) {
            do {
                shardId = (shardId + 1) % shardNum;
                totalCount -= shardRequestCount[shardId];
                shardRequestCount[shardId] = 0;
                windowBorder += tinyWindowSize;
            } while (windowBorder < currentTime);
        }
        if (totalCount < maxRequestCount) {
            log.info("tryAcquire success,{}", shardId);
            shardRequestCount[shardId]++;
            totalCount++;
            return true;
        } else {
            log.info("tryAcquire fail,{}", shardId);
            return false;
        }
    }
}

Test: 1000 ms window split into 10 shards of 100 ms each, allowing 100 requests per second. The run shows 6 requests passed in the 0.9‑1.0 s slice and 4 in the 1.0‑1.1 s slice, smoothing the burst compared with the fixed‑window method.

While more precise, sliding windows consume extra memory for per‑shard counters and still cannot fully eliminate burst spikes.

Leaky Bucket Algorithm

The leaky bucket shapes traffic by treating incoming requests as water poured into a bucket that leaks at a constant rate. Excess water (requests) beyond the bucket capacity is discarded.

@Slf4j
public class LeakyBucketRateLimiter {
    // bucket capacity
    private int capacity;
    // current water level
    private AtomicInteger water = new AtomicInteger(0);
    // timestamp of the last leak
    private long leakTimeStamp;
    // leak rate (requests per second)
    private int leakRate;

    public LeakyBucketRateLimiter(int capacity, int leakRate) {
        this.capacity = capacity;
        this.leakRate = leakRate;
    }

    public synchronized boolean tryAcquire() {
        if (water.get() == 0) {
            log.info("start leaking");
            leakTimeStamp = System.currentTimeMillis();
            water.incrementAndGet();
            return water.get() < capacity;
        }
        long currentTime = System.currentTimeMillis();
        int leakedWater = (int) ((currentTime - leakTimeStamp) / 1000 * leakRate);
        log.info("lastTime:{}, currentTime:{}, LeakedWater:{}", leakTimeStamp, currentTime, leakedWater);
        if (leakedWater != 0) {
            int leftWater = water.get() - leakedWater;
            water.set(Math.max(0, leftWater));
            leakTimeStamp = System.currentTimeMillis();
        }
        log.info("remaining capacity:{}", capacity - water.get());
        if (water.get() < capacity) {
            log.info("tryAcquire success");
            water.incrementAndGet();
            return true;
        } else {
            log.info("tryAcquire fail");
            return false;
        }
    }
}

Test: bucket capacity 3, leak rate 1 req/s, request interval 500 ms. The output shows three requests succeed, then subsequent attempts are throttled until the bucket empties.

Drawback: every request must wait in queue regardless of current load, which can waste resources; therefore leaky bucket is less popular in real‑world services.

Token Bucket Algorithm (Guava RateLimiter)

Token bucket improves on leaky bucket by allowing bursts: tokens are generated at a steady rate and stored up to a maximum capacity. A request consumes a token; if none are available, it is throttled.

Guava’s RateLimiter implements this algorithm. Dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>

Test: create a limiter with 5 permits per second and acquire permits in a loop. The log shows ~200 ms wait per permit, confirming the rate.

void acquireTest() {
    RateLimiter rateLimiter = RateLimiter.create(5);
    for (int i = 0; i < 10; i++) {
        double time = rateLimiter.acquire();
        log.info("wait time:{}s", time);
    }
}

Guava also supports pre‑acquire (requesting multiple permits) and warm‑up periods. The source comment explains that the number of permits requested does not affect the latency of the current acquire call, but it does affect the next request’s waiting time.

void acquireMultiTest() {
    RateLimiter rateLimiter = RateLimiter.create(1);
    for (int i = 0; i < 3; i++) {
        int num = 2 * i + 1;
        log.info("acquire {} permits", num);
        double cost = rateLimiter.acquire(num);
        log.info("acquire {} permits finished, cost {}ms", num, cost);
    }
}

Result: after a heavy request consumes many tokens, the following request is served immediately once the previous tokens have been “repaid” by the leak rate, demonstrating burst handling.

Middleware Rate Limiting – Sentinel

For distributed systems, single‑JVM limiters are insufficient. Sentinel, a component of Spring Cloud Alibaba, provides cluster‑wide flow control. Annotate service methods with @SentinelResource, specify a block handler, and define rules programmatically.

@Service
public class QueryService {
    public static final String KEY = "query";

    @SentinelResource(value = KEY, blockHandler = "blockHandlerMethod")
    public String query(String name) {
        return "begin query,name=" + name;
    }

    public String blockHandlerMethod(String name, BlockException e) {
        e.printStackTrace();
        return "blockHandlerMethod for Query : " + name;
    }
}

Rule configuration (QPS = 1):

@Component
public class SentinelConfig {
    @PostConstruct
    private void init() {
        List<FlowRule> rules = new ArrayList<>();
        FlowRule rule = new FlowRule(QueryService.KEY);
        rule.setCount(1);
        rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        rule.setLimitApp("default");
        rules.add(rule);
        FlowRuleManager.loadRules(rules);
    }
}

Application.yml snippet to start the Sentinel dashboard (port 8719) and connect to it:

spring:
  application:
    name: sentinel-test
  cloud:
    sentinel:
      transport:
        port: 8719
        dashboard: localhost:8088

Running the dashboard shows the flow rule, and requests exceeding the QPS trigger the block handler.

Gateway Rate Limiting – Spring Cloud Gateway

Spring Cloud Gateway uses Redis + Lua to implement a token‑bucket limiter. Add the gateway and reactive Redis dependencies, then configure the limiter in application.yml with replenish rate, burst capacity, and a key resolver (e.g., request path).

spring:
  application:
    name: gateway-test
  cloud:
    gateway:
      routes:
        - id: limit_route
          uri: lb://sentinel-test
          predicates:
            - Path=/sentinel-test/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 1
                redis-rate-limiter.burstCapacity: 2
                key-resolver: "#{@pathKeyResolver}"
            - StripPrefix=1
  redis:
    host: 127.0.0.1
    port: 6379

Key resolver implementation:

@Slf4j
@Component
public class PathKeyResolver implements KeyResolver {
    public Mono<String> resolve(ServerWebExchange exchange) {
        String path = exchange.getRequest().getPath().toString();
        log.info("Request path: {}", path);
        return Mono.just(path);
    }
}

Testing with JMeter (500 ms interval) shows one request passed per two attempts, and throttled requests receive HTTP 429. The same mechanism can be extended to other dimensions such as headers or query parameters.

Conclusion

Rate limiting is essential for maintaining system resilience under burst traffic. Fixed‑window is easy but coarse; sliding‑window refines granularity at memory cost; leaky bucket guarantees smooth outflow but queues all traffic; token bucket (e.g., Guava RateLimiter) balances steady rate with bursts; Sentinel and Gateway provide cluster‑wide enforcement for microservice and API‑gateway scenarios. Combining rate limiting with circuit breaking and degradation yields a robust defense against high‑load spikes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavamicroservicesSentinelGuavagatewayrate limitingSpring Cloud
Programmer XiaoFu
Written by

Programmer XiaoFu

xiaofucode.com – a programmer learning guide driven by the pursuit of profit

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.