Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions

This article explains why rate limiting is essential for microservice stability, introduces basic concepts like thresholds and rejection strategies, and walks through multiple algorithms—including fixed‑window, sliding‑window, sliding‑log, leaky‑bucket, token‑bucket—and their Java implementations as well as Redis‑based distributed approaches, complete with code samples and performance considerations.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions

Introduction

With the rise of micro‑services, inter‑service dependencies have become stronger and call relationships more complex, making service stability increasingly important. When faced with sudden traffic spikes, malicious requests, or high request frequencies that pressure downstream services, techniques such as caching, circuit breaking, load balancing, and especially rate limiting are required to maintain stability.

1. Rate Limiting

Rate limiting restricts the number of requests or concurrent calls within a time window to protect system resources. Two key concepts are:

Threshold : the allowed request count per unit time (e.g., QPS = 10 means at most 10 requests per second).

Rejection strategy : actions taken when the threshold is exceeded, such as immediate rejection or queuing.

2. Fixed‑Window Algorithm

The fixed‑window (counter) algorithm uses an atomic counter to count requests within a 1‑second window. When the counter reaches the limit, the rejection strategy is triggered, and the counter resets every second.

2.1. Code Example

/**
 * @author https://www.wdbyte.com
 */
public class RateLimiterSimpleWindow {
    // Threshold
    private static Integer QPS = 2;
    // Time window (ms)
    private static long TIME_WINDOWS = 1000;
    // Counter
    private static AtomicInteger REQ_COUNT = new AtomicInteger();
    private static long START_TIME = System.currentTimeMillis();

    public synchronized static boolean tryAcquire() {
        if ((System.currentTimeMillis() - START_TIME) > TIME_WINDOWS) {
            REQ_COUNT.set(0);
            START_TIME = System.currentTimeMillis();
        }
        return REQ_COUNT.incrementAndGet() <= QPS;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 10; i++) {
            Thread.sleep(250);
            LocalTime now = LocalTime.now();
            if (!tryAcquire()) {
                System.out.println(now + " 被限流");
            } else {
                System.out.println(now + " 做点什么");
            }
        }
    }
}

The output shows that with QPS = 2, roughly one request per second is rejected, but the fixed‑window suffers from boundary‑condition issues when requests span two adjacent windows.

Fixed window algorithm illustration
Fixed window algorithm illustration

3. Sliding‑Window Algorithm

The sliding‑window improves on the fixed‑window by moving the window forward in smaller steps (e.g., every 500 ms), reducing the chance of boundary spikes.

Sliding window algorithm illustration
Sliding window algorithm illustration

3.1. Code Example

package com.wdbyte.rate.limiter;

import java.time.LocalTime;
import java.util.concurrent.atomic.AtomicInteger;

public class RateLimiterSlidingWindow {
    private int qps = 2;
    private long windowSize = 1000;
    private Integer windowCount = 10;
    private WindowInfo[] windowArray = new WindowInfo[windowCount];

    public RateLimiterSlidingWindow(int qps) {
        this.qps = qps;
        long currentTimeMillis = System.currentTimeMillis();
        for (int i = 0; i < windowArray.length; i++) {
            windowArray[i] = new WindowInfo(currentTimeMillis, new AtomicInteger(0));
        }
    }

    public synchronized boolean tryAcquire() {
        long currentTimeMillis = System.currentTimeMillis();
        int currentIndex = (int)(currentTimeMillis % windowSize / (windowSize / windowCount));
        int sum = 0;
        for (int i = 0; i < windowArray.length; i++) {
            WindowInfo windowInfo = windowArray[i];
            if ((currentTimeMillis - windowInfo.getTime()) > windowSize) {
                windowInfo.getNumber().set(0);
                windowInfo.setTime(currentTimeMillis);
            }
            if (currentIndex == i && windowInfo.getNumber().get() < qps) {
                windowInfo.getNumber().incrementAndGet();
            }
            sum += windowInfo.getNumber().get();
        }
        return sum <= qps;
    }

    private class WindowInfo {
        private Long time;
        private AtomicInteger number;
        public WindowInfo(long time, AtomicInteger number) {
            this.time = time;
            this.number = number;
        }
        public Long getTime() { return time; }
        public void setTime(Long time) { this.time = time; }
        public AtomicInteger getNumber() { return number; }
    }
}

4. Sliding‑Log Algorithm

This method records each request timestamp; a new request checks how many timestamps fall within the last second. It avoids window‑boundary problems but consumes more memory.

4.1. Code Example

package com.wdbyte.rate.limiter;

import java.time.LocalTime;
import java.util.HashSet;
import java.util.Set;
import java.util.TreeMap;

public class RateLimiterSlidingLog {
    private Integer qps = 2;
    private TreeMap<Long, Long> treeMap = new TreeMap<>();
    private long clearTime = 60 * 1000;

    public RateLimiterSlidingLog(Integer qps) { this.qps = qps; }

    public synchronized boolean tryAcquire() {
        long now = System.currentTimeMillis();
        if (!treeMap.isEmpty() && (treeMap.firstKey() - now) > clearTime) {
            Set<Long> keySet = new HashSet<>(treeMap.subMap(0L, now - 1000).keySet());
            for (Long key : keySet) { treeMap.remove(key); }
        }
        int sum = 0;
        for (Long value : treeMap.subMap(now - 1000, now).values()) { sum += value; }
        if (sum + 1 > qps) { return false; }
        treeMap.compute(now, (k, v) -> v == null ? 1L : v + 1);
        return sum <= qps;
    }
}

5. Leaky‑Bucket Algorithm

The leaky‑bucket treats requests as water droplets entering a bucket; the bucket leaks at a constant rate (e.g., one token every 500 ms for QPS = 2). When the bucket overflows, excess requests are rejected.

Leaky bucket algorithm illustration
Leaky bucket algorithm illustration

6. Token‑Bucket Algorithm

Tokens are added to a bucket at a fixed rate (e.g., one token every 500 ms for QPS = 2). A request consumes a token; if none are available, the request is rejected. Google Guava’s RateLimiter implements this logic.

6.1. Code Example (Guava)

// qps 2
RateLimiter rateLimiter = RateLimiter.create(2);
for (int i = 0; i < 10; i++) {
    String time = LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_TIME);
    System.out.println(time + ":" + rateLimiter.tryAcquire());
    Thread.sleep(250);
}

6.2. Implementation Insight

Guava calculates token availability lazily during each acquire call, storing the next token generation time and avoiding a dedicated replenishment thread.

void resync(long nowMicros) {
    if (nowMicros > this.nextFreeTicketMicros) {
        double newPermits = (double)(nowMicros - this.nextFreeTicketMicros) / this.coolDownIntervalMicros();
        this.storedPermits = Math.min(this.maxPermits, this.storedPermits + newPermits);
        this.nextFreeTicketMicros = nowMicros;
    }
}

7. Redis Distributed Rate Limiting

Redis can be used for distributed rate limiting because it is fast, in‑memory, and single‑threaded.

7.1. Fixed‑Window with Redis

Using INCR and EXPIRE in an atomic Lua script to count requests per second.

local count = redis.call("incr", KEYS[1])
if count == 1 then
  redis.call('expire', KEYS[1], ARGV[2])
end
if count > tonumber(ARGV[1]) then
  return 0
end
return 1

7.2. Sliding‑Window with Redis ZSET

Store request timestamps in a sorted set; remove entries older than the window, count current entries, and decide whether to allow the request.

-- Remove old entries
redis.call('zremrangeByScore', KEYS[1], 0, ARGV[1])
-- Count current entries
local res = redis.call('zcard', KEYS[1])
if (res == nil) or (res < tonumber(ARGV[3])) then
  redis.call('zadd', KEYS[1], ARGV[2], ARGV[4])
  return 1
else
  return 0
end

Both implementations demonstrate how to achieve per‑second QPS limits (e.g., QPS = 4) in a distributed environment.

8. Summary

The article compares window‑based and bucket‑based rate‑limiting algorithms. Window algorithms are simple and give clear QPS visibility but suffer from boundary spikes; bucket algorithms provide smoother traffic shaping—leaky bucket guarantees constant consumption, while token bucket handles bursts more gracefully. Single‑node implementations suit monolithic services, whereas Redis‑backed solutions enable distributed rate limiting. Additional tools such as Guava, Redisson, or Alibaba Sentinel can further simplify integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsjavabackend-developmentrate limiting
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.