Rate Limiting Algorithms: Fixed Window, Sliding Window, Leaky Bucket, Token Bucket, and Distributed Rate Limiting
This article explains the importance of rate limiting in microservice architectures, introduces four basic algorithms—fixed window, sliding window, leaky bucket, and token bucket—compares their advantages and disadvantages, and presents both single-machine and distributed implementations with Go code examples.
With the rise of micro‑service architectures, the dependency and call relationships between services become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request timeouts or even server crashes, so rate limiting is used to quickly reject requests that exceed configured thresholds and protect both the service itself and its upstream/downstream dependencies.
A good rate‑limiting design should consider business characteristics and include six key points: multi‑level limiting, dynamic threshold adjustment, flexible dimensions, decoupling from business logic, fault tolerance, and real‑time monitoring and alerting.
Basic Rate‑Limiting Algorithms
1. Fixed Window Limiting
The fixed window algorithm divides time into equal intervals (e.g., one second) and counts requests within each window. If the count exceeds the configured limit, further requests are rejected until the window resets.
type FixedWindowLimiter struct {
windowSize time.Duration // 窗口大小
maxRequests int // 最大请求数
requests int // 当前窗口内的请求数
lastReset int64 // 上次窗口重置时间(秒级时间戳)
resetMutex sync.Mutex // 重置锁
}
func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}
func (limiter *FixedWindowLimiter) AllowRequest() bool {
limiter.resetMutex.Lock()
defer limiter.resetMutex.Unlock()
if time.Now().Unix()-limiter.lastReset >= int64(limiter.windowSize.Seconds()) {
limiter.requests = 0
limiter.lastReset = time.Now().Unix()
}
if limiter.requests >= limiter.maxRequests {
return false
}
limiter.requests++
return true
}
func main() {
limiter := NewFixedWindowLimiter(1*time.Second, 3) // 每秒最多 3 个请求
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() {
fmt.Println(now + " 请求通过")
} else {
fmt.Println(now + " 请求被限流")
}
time.Sleep(100 * time.Millisecond)
}
}Pros: simple to implement, easy to understand, suitable for stable traffic.
Cons: cannot handle short‑term bursts, uneven request distribution may cause unfairness.
2. Sliding Window Limiting
The sliding window improves on the fixed window by moving the window continuously, removing expired requests and providing finer‑grained control.
type SlidingWindowLimiter struct {
windowSize time.Duration // 窗口大小
maxRequests int // 最大请求数
requests []time.Time // 窗口内的请求时间
requestsLock sync.Mutex // 请求锁
}
func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}
func (limiter *SlidingWindowLimiter) AllowRequest() bool {
limiter.requestsLock.Lock()
defer limiter.requestsLock.Unlock()
now := time.Now()
// 移除过期请求
for len(limiter.requests) > 0 && now.Sub(limiter.requests[0]) > limiter.windowSize {
limiter.requests = limiter.requests[1:]
}
if len(limiter.requests) >= limiter.maxRequests {
return false
}
limiter.requests = append(limiter.requests, now)
return true
}
func main() {
limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2) // 每 500ms 最多 2 个请求
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() {
fmt.Println(now + " 请求通过")
} else {
fmt.Println(now + " 请求被限流")
}
time.Sleep(100 * time.Millisecond)
}
}Pros: smooth handling of bursts, higher precision.
Cons: higher memory consumption and more complex implementation.
3. Leaky Bucket Limiting
The leaky bucket treats incoming requests as water poured into a bucket that leaks at a constant rate; excess requests are dropped when the bucket is full.
type LeakyBucket struct {
rate float64 // 漏桶速率(请求/秒)
capacity int // 桶容量
water int // 当前水量(已入桶请求数)
lastLeakMs int64 // 上次漏水时间戳(秒)
}
func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}
func (lb *LeakyBucket) Allow() bool {
now := time.Now().Unix()
elapsed := now - lb.lastLeakMs
leakAmount := int(float64(elapsed) * lb.rate)
if leakAmount > 0 {
if leakAmount > lb.water {
lb.water = 0
} else {
lb.water -= leakAmount
}
}
if lb.water >= lb.capacity {
lb.water--
return false
}
lb.water++
lb.lastLeakMs = now
return true
}
func main() {
bucket := NewLeakyBucket(3, 4) // 每秒 3 请求,容量 4
for i := 1; i <= 15; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() {
fmt.Printf(now+" 第 %d 个请求通过\n", i)
} else {
fmt.Printf(now+" 第 %d 个请求被限流\n", i)
}
time.Sleep(200 * time.Millisecond)
}
}Pros: smooths traffic, simple to understand.
Cons: less flexible for sudden bursts, may waste capacity when traffic is low.
4. Token Bucket Limiting
The token bucket generates tokens at a fixed rate; a request can proceed only when a token is available, allowing short bursts while enforcing an average rate.
type TokenBucket struct {
rate float64 // 令牌产生速率(令牌/秒)
capacity float64 // 桶容量
tokens float64 // 当前令牌数
lastUpdate time.Time // 上次更新时间
mu sync.Mutex
}
func NewTokenBucket(rate, capacity float64) *TokenBucket {
return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastUpdate).Seconds()
tb.tokens += elapsed * tb.rate
if tb.tokens > tb.capacity {
tb.tokens = tb.capacity
}
if tb.tokens >= 1.0 {
tb.tokens--
tb.lastUpdate = now
return true
}
return false
}
func main() {
bucket := NewTokenBucket(2.0, 3.0)
for i := 1; i <= 10; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() {
fmt.Printf(now+" 第 %d 个请求通过\n", i)
} else {
fmt.Printf(now+" 第 %d 个请求被限流\n", i)
}
time.Sleep(200 * time.Millisecond)
}
}Pros: smooths bursts, flexible rate control, supports occasional spikes.
Cons: slightly more complex implementation, requires precise time handling.
Distributed Rate Limiting
When services run on multiple machines, a single‑node limiter is insufficient. The article presents three distributed approaches:
Centralized Redis token bucket: All instances request a token from Redis via a Lua script. This guarantees global limits but introduces a single point of failure and bandwidth bottlenecks.
Load‑balancer‑based local limiting: Requests are evenly distributed by a load balancer; each node runs its own limiter (e.g., token bucket) and can adjust thresholds based on local resource usage. This reduces central contention but relies on the load balancer’s availability.
Coordination‑service (ZooKeeper/etcd) based limiting: Nodes acquire a distributed lock or modify a shared counter to obtain tokens. It provides precise global limits without a single data store, yet adds operational complexity and requires a highly available coordination service.
Each method’s trade‑offs are discussed, emphasizing that the optimal solution depends on system requirements, existing technology stack, load characteristics, and tolerance for complexity.
Conclusion
There is no universally best rate‑limiting scheme; the choice must balance simplicity, precision, scalability, and fault tolerance. Combining rate limiting with other techniques such as caching, load balancing, and asynchronous processing yields a robust system capable of handling high concurrency while maintaining service quality.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.