Backend Development 29 min read

Rate Limiting Algorithms: Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, and Distributed Rate Limiting

This article explains the importance of rate limiting in microservice architectures, introduces four basic algorithms—fixed window, sliding window, leaky bucket, and token bucket—compares their advantages and disadvantages, and presents both single-machine and distributed implementations with Go code examples.

Architect

May 10, 2024

Rate Limiting Algorithms: Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, and Distributed Rate Limiting

With the rise of micro‑service architectures, the dependency and call relationships between services become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request timeouts or even server crashes, so rate limiting is used to quickly reject requests that exceed configured thresholds and protect both the service itself and its upstream/downstream dependencies.

A good rate‑limiting design should consider business characteristics and include six key points: multi‑level limiting, dynamic threshold adjustment, flexible dimensions, decoupling from business logic, fault tolerance, and real‑time monitoring and alerting.

Basic Rate‑Limiting Algorithms

1. Fixed Window Limiting

The fixed window algorithm divides time into equal intervals (e.g., one second) and counts requests within each window. If the count exceeds the configured limit, further requests are rejected until the window resets.

type FixedWindowLimiter struct {
    windowSize  time.Duration // 窗口大小
    maxRequests int           // 最大请求数
    requests    int           // 当前窗口内的请求数
    lastReset   int64         // 上次窗口重置时间（秒级时间戳）
    resetMutex  sync.Mutex    // 重置锁
}

func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
    return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}

func (limiter *FixedWindowLimiter) AllowRequest() bool {
    limiter.resetMutex.Lock()
    defer limiter.resetMutex.Unlock()
    if time.Now().Unix()-limiter.lastReset >= int64(limiter.windowSize.Seconds()) {
        limiter.requests = 0
        limiter.lastReset = time.Now().Unix()
    }
    if limiter.requests >= limiter.maxRequests {
        return false
    }
    limiter.requests++
    return true
}

func main() {
    limiter := NewFixedWindowLimiter(1*time.Second, 3) // 每秒最多 3 个请求
    for i := 0; i < 15; i++ {
        now := time.Now().Format("15:04:05")
        if limiter.AllowRequest() {
            fmt.Println(now + " 请求通过")
        } else {
            fmt.Println(now + " 请求被限流")
        }
        time.Sleep(100 * time.Millisecond)
    }
}

Pros: simple to implement, easy to understand, suitable for stable traffic.

Cons: cannot handle short‑term bursts, uneven request distribution may cause unfairness.

2. Sliding Window Limiting

The sliding window improves on the fixed window by moving the window continuously, removing expired requests and providing finer‑grained control.

type SlidingWindowLimiter struct {
    windowSize   time.Duration // 窗口大小
    maxRequests  int           // 最大请求数
    requests     []time.Time   // 窗口内的请求时间
    requestsLock sync.Mutex    // 请求锁
}

func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
    return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}

func (limiter *SlidingWindowLimiter) AllowRequest() bool {
    limiter.requestsLock.Lock()
    defer limiter.requestsLock.Unlock()
    now := time.Now()
    // 移除过期请求
    for len(limiter.requests) > 0 && now.Sub(limiter.requests[0]) > limiter.windowSize {
        limiter.requests = limiter.requests[1:]
    }
    if len(limiter.requests) >= limiter.maxRequests {
        return false
    }
    limiter.requests = append(limiter.requests, now)
    return true
}

func main() {
    limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2) // 每 500ms 最多 2 个请求
    for i := 0; i < 15; i++ {
        now := time.Now().Format("15:04:05")
        if limiter.AllowRequest() {
            fmt.Println(now + " 请求通过")
        } else {
            fmt.Println(now + " 请求被限流")
        }
        time.Sleep(100 * time.Millisecond)
    }
}

Pros: smooth handling of bursts, higher precision.

Cons: higher memory consumption and more complex implementation.

3. Leaky Bucket Limiting

The leaky bucket treats incoming requests as water poured into a bucket that leaks at a constant rate; excess requests are dropped when the bucket is full.

type LeakyBucket struct {
    rate       float64 // 漏桶速率（请求/秒）
    capacity   int     // 桶容量
    water      int     // 当前水量（已入桶请求数）
    lastLeakMs int64   // 上次漏水时间戳（秒）
}

func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
    return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}

func (lb *LeakyBucket) Allow() bool {
    now := time.Now().Unix()
    elapsed := now - lb.lastLeakMs
    leakAmount := int(float64(elapsed) * lb.rate)
    if leakAmount > 0 {
        if leakAmount > lb.water {
            lb.water = 0
        } else {
            lb.water -= leakAmount
        }
    }
    if lb.water >= lb.capacity {
        lb.water--
        return false
    }
    lb.water++
    lb.lastLeakMs = now
    return true
}

func main() {
    bucket := NewLeakyBucket(3, 4) // 每秒 3 请求，容量 4
    for i := 1; i <= 15; i++ {
        now := time.Now().Format("15:04:05")
        if bucket.Allow() {
            fmt.Printf(now+" 第 %d 个请求通过
", i)
        } else {
            fmt.Printf(now+" 第 %d 个请求被限流
", i)
        }
        time.Sleep(200 * time.Millisecond)
    }
}

Pros: smooths traffic, simple to understand.

Cons: less flexible for sudden bursts, may waste capacity when traffic is low.

4. Token Bucket Limiting

The token bucket generates tokens at a fixed rate; a request can proceed only when a token is available, allowing short bursts while enforcing an average rate.

type TokenBucket struct {
    rate       float64   // 令牌产生速率（令牌/秒）
    capacity   float64   // 桶容量
    tokens     float64   // 当前令牌数
    lastUpdate time.Time // 上次更新时间
    mu         sync.Mutex
}

func NewTokenBucket(rate, capacity float64) *TokenBucket {
    return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}

func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()
    now := time.Now()
    elapsed := now.Sub(tb.lastUpdate).Seconds()
    tb.tokens += elapsed * tb.rate
    if tb.tokens > tb.capacity {
        tb.tokens = tb.capacity
    }
    if tb.tokens >= 1.0 {
        tb.tokens--
        tb.lastUpdate = now
        return true
    }
    return false
}

func main() {
    bucket := NewTokenBucket(2.0, 3.0)
    for i := 1; i <= 10; i++ {
        now := time.Now().Format("15:04:05")
        if bucket.Allow() {
            fmt.Printf(now+" 第 %d 个请求通过
", i)
        } else {
            fmt.Printf(now+" 第 %d 个请求被限流
", i)
        }
        time.Sleep(200 * time.Millisecond)
    }
}

Pros: smooths bursts, flexible rate control, supports occasional spikes.

Cons: slightly more complex implementation, requires precise time handling.

Distributed Rate Limiting

When services run on multiple machines, a single‑node limiter is insufficient. The article presents three distributed approaches:

Centralized Redis token bucket: All instances request a token from Redis via a Lua script. This guarantees global limits but introduces a single point of failure and bandwidth bottlenecks.

Load‑balancer‑based local limiting: Requests are evenly distributed by a load balancer; each node runs its own limiter (e.g., token bucket) and can adjust thresholds based on local resource usage. This reduces central contention but relies on the load balancer’s availability.

Coordination‑service (ZooKeeper/etcd) based limiting: Nodes acquire a distributed lock or modify a shared counter to obtain tokens. It provides precise global limits without a single data store, yet adds operational complexity and requires a highly available coordination service.

Each method’s trade‑offs are discussed, emphasizing that the optimal solution depends on system requirements, existing technology stack, load characteristics, and tolerance for complexity.

Conclusion

There is no universally best rate‑limiting scheme; the choice must balance simplicity, precision, scalability, and fault tolerance. Combining rate limiting with other techniques such as caching, load balancing, and asynchronous processing yields a robust system capable of handling high concurrency while maintaining service quality.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems algorithm Microservices golang rate limiting

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.