Backend Development 20 min read

Rate Limiting Algorithms and Implementations in Java Microservices

The article explains service rate limiting and demonstrates six Java implementations—Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, Sentinel middleware, and Spring Cloud Gateway—detailing their algorithms, code examples, and configuration to protect microservices from overload.

Java Tech Enthusiast

Sep 12, 2024

Rate Limiting Algorithms and Implementations in Java Microservices

This article introduces the concept of service rate limiting, which controls request rates to protect services from overload. It covers six common algorithms—Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, middleware‑based limiting (Sentinel), and gateway limiting (Spring Cloud Gateway)—and provides practical Java implementations.

Fixed Window Algorithm

The fixed window maintains a counter for a defined time slot and rejects requests once the limit is reached. The implementation uses a window size (ms) and a maximum request count.

@Slf4j
public class FixedWindowRateLimiter {
    // time window size in ms
    private long windowSize;
    // allowed request count per window
    private int maxRequestCount;
    // current count
    private AtomicInteger count = new AtomicInteger(0);
    // window right border
    private long windowBorder;

    public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
        this.windowSize = windowSize;
        this.maxRequestCount = maxRequestCount;
        windowBorder = System.currentTimeMillis() + windowSize;
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            log.info("window reset");
            do {
                windowBorder += windowSize;
            } while (windowBorder < currentTime);
            count = new AtomicInteger(0);
        }
        if (count.intValue() < maxRequestCount) {
            count.incrementAndGet();
            log.info("tryAcquire success");
            return true;
        } else {
            log.info("tryAcquire fail");
            return false;
        }
    }
}

Test code shows that 5 requests are allowed per 1000 ms.

void test() throws InterruptedException {
    FixedWindowRateLimiter limiter = new FixedWindowRateLimiter(1000, 5);
    for (int i = 0; i < 10; i++) {
        if (limiter.tryAcquire()) {
            System.out.println("执行任务");
        } else {
            System.out.println("被限流");
            TimeUnit.MILLISECONDS.sleep(300);
        }
    }
}

The fixed window is simple but can cause traffic spikes at window boundaries.

Sliding Window Algorithm

Sliding window refines the fixed window by dividing the interval into smaller shards and sliding one shard at a time. It keeps an array of counters for each shard.

@Slf4j
public class SlidingWindowRateLimiter {
    private long windowSize;
    private int shardNum;
    private int maxRequestCount;
    private int[] shardRequestCount;
    private int totalCount;
    private int shardId;
    private long tinyWindowSize;
    private long windowBorder;

    public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
        this.windowSize = windowSize;
        this.shardNum = shardNum;
        this.maxRequestCount = maxRequestCount;
        shardRequestCount = new int[shardNum];
        tinyWindowSize = windowSize / shardNum;
        windowBorder = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (currentTime > windowBorder) {
            do {
                shardId = (++shardId) % shardNum;
                totalCount -= shardRequestCount[shardId];
                shardRequestCount[shardId] = 0;
                windowBorder += tinyWindowSize;
            } while (windowBorder < currentTime);
        }
        if (totalCount < maxRequestCount) {
            log.info("tryAcquire success,{}", shardId);
            shardRequestCount[shardId]++;
            totalCount++;
            return true;
        } else {
            log.info("tryAcquire fail,{}", shardId);
            return false;
        }
    }
}

Testing with a 1‑second window split into 10 shards (100 ms each) demonstrates smoother request distribution.

void test() throws InterruptedException {
    SlidingWindowRateLimiter limiter = new SlidingWindowRateLimiter(1000, 10, 10);
    TimeUnit.MILLISECONDS.sleep(800);
    for (int i = 0; i < 15; i++) {
        boolean acquire = limiter.tryAcquire();
        System.out.println(acquire ? "执行任务" : "被限流");
        TimeUnit.MILLISECONDS.sleep(10);
    }
}

Leaky Bucket Algorithm

The leaky bucket models a bucket that leaks at a constant rate. Requests fill the bucket; excess requests are dropped.

@Slf4j
public class LeakyBucketRateLimiter {
    private int capacity;
    private AtomicInteger water = new AtomicInteger(0);
    private long leakTimeStamp;
    private int leakRate;

    public LeakyBucketRateLimiter(int capacity, int leakRate) {
        this.capacity = capacity;
        this.leakRate = leakRate;
    }

    public synchronized boolean tryAcquire() {
        if (water.get() == 0) {
            log.info("start leaking");
            leakTimeStamp = System.currentTimeMillis();
            water.incrementAndGet();
            return water.get() < capacity;
        }
        long currentTime = System.currentTimeMillis();
        int leakedWater = (int) ((currentTime - leakTimeStamp) / 1000 * leakRate);
        if (leakedWater != 0) {
            int leftWater = water.get() - leakedWater;
            water.set(Math.max(0, leftWater));
            leakTimeStamp = System.currentTimeMillis();
        }
        if (water.get() < capacity) {
            log.info("tryAcquire success");
            water.incrementAndGet();
            return true;
        } else {
            log.info("tryAcquire fail");
            return false;
        }
    }
}

Test code with capacity 3 and leak rate 1 req/s shows alternating success and throttling.

void test() throws InterruptedException {
    LeakyBucketRateLimiter limiter = new LeakyBucketRateLimiter(3, 1);
    for (int i = 0; i < 15; i++) {
        if (limiter.tryAcquire()) {
            System.out.println("执行任务");
        } else {
            System.out.println("被限流");
        }
        TimeUnit.MILLISECONDS.sleep(500);
    }
}

Token Bucket Algorithm

Token bucket improves leaky bucket by allowing bursts. Tokens are generated at a steady rate and stored up to a maximum. Guava’s RateLimiter implements this.

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>

Example that creates 5 tokens per second:

void acquireTest() {
    RateLimiter rateLimiter = RateLimiter.create(5);
    for (int i = 0; i < 10; i++) {
        double wait = rateLimiter.acquire();
        log.info("等待时间：{}s", wait);
    }
}

Guava also supports pre‑acquire (multiple permits) and warm‑up periods, which help handle sudden bursts without long delays.

void acquireMultiTest() {
    RateLimiter limiter = RateLimiter.create(1);
    for (int i = 0; i < 3; i++) {
        int num = 2 * i + 1;
        log.info("获取{}个令牌", num);
        double cost = limiter.acquire(num);
        log.info("获取{}个令牌结束，耗时{}ms", num, cost);
    }
}

Warm‑up example:

void acquireSmoothly() {
    RateLimiter limiter = RateLimiter.create(5, 3, TimeUnit.SECONDS);
    long start = System.currentTimeMillis();
    for (int i = 0; i < 15; i++) {
        double time = limiter.acquire();
        log.info("等待时间:{}s, 总时间:{}ms", time, System.currentTimeMillis() - start);
    }
}

Middleware Rate Limiting – Sentinel

For distributed microservices, Sentinel (Spring Cloud Alibaba) provides cluster‑wide flow control. Annotate service methods with @SentinelResource and define a block handler.

@Service
public class QueryService {
    public static final String KEY = "query";

    @SentinelResource(value = KEY, blockHandler = "blockHandlerMethod")
    public String query(String name) {
        return "begin query,name=" + name;
    }

    public String blockHandlerMethod(String name, BlockException e) {
        e.printStackTrace();
        return "blockHandlerMethod for Query : " + name;
    }
}

Rules are loaded programmatically (e.g., QPS = 1) and can be visualized in the Sentinel dashboard.

@Component
public class SentinelConfig {
    @PostConstruct
    private void init() {
        List<FlowRule> rules = new ArrayList<>();
        FlowRule rule = new FlowRule(QueryService.KEY);
        rule.setCount(1);
        rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        rule.setLimitApp("default");
        rules.add(rule);
        FlowRuleManager.loadRules(rules);
    }
}

Configuration in application.yml sets the Sentinel port and dashboard address.

spring:
  application:
    name: sentinel-test
  cloud:
    sentinel:
      transport:
        port: 8719
        dashboard: localhost:8088

Gateway Rate Limiting – Spring Cloud Gateway

Gateway uses Redis + Lua to implement a token‑bucket limiter. The configuration defines replenish rate, burst capacity, and a key resolver (e.g., request path).

spring:
  cloud:
    gateway:
      routes:
        - id: limit_route
          uri: lb://sentinel-test
          predicates:
            - Path=/sentinel-test/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 1
                redis-rate-limiter.burstCapacity: 2
                key-resolver: "#{@pathKeyResolver}"
    redis:
      host: 127.0.0.1
      port: 6379

Key resolver example:

@Component
public class PathKeyResolver implements KeyResolver {
    public Mono<String> resolve(ServerWebExchange exchange) {
        String path = exchange.getRequest().getPath().toString();
        log.info("Request path: {}", path);
        return Mono.just(path);
    }
}

When the QPS limit is exceeded, the gateway returns HTTP 429. The same mechanism can be extended to other dimensions such as headers or query parameters.

Summary

Rate limiting is essential for protecting services against traffic spikes. Different algorithms trade off simplicity, precision, and resource consumption. Combining rate limiting with circuit breaking and degradation yields a robust strategy for high‑traffic microservice systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java algorithm Microservices Sentinel gateway rate limiting spring cloud

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.