Mastering Rate Limiting: Algorithms, Pros, Cons, and Distributed Solutions
This article explores why rate limiting is essential for high‑concurrency services, introduces four core algorithms with Go implementations, compares their strengths and weaknesses, and presents practical distributed limiting strategies using Redis, load balancers, and coordination services.
Introduction
With the rise of microservices, dependencies and call relationships become increasingly complex, making service stability crucial. Sudden traffic spikes can cause timeouts or server crashes, so rate limiting is applied to quickly reject requests that exceed configured limits, protecting both the system and its upstream/downstream services.
Key Design Principles
Multi‑level limiting : Apply limits at application, service, and data layers.
Dynamic thresholds : Adjust limits based on real‑time load.
Flexible dimensions : Include interface, device, IP, account ID, or behavior patterns.
Decoupling : Keep limiting as a separate service from business logic.
Fault tolerance : Provide fallback (circuit breaking, degradation) when the limiter fails.
Monitoring & alerts : Real‑time monitoring and alarm when limits are triggered.
Background
Three main tools protect high‑concurrency services: caching, degradation, and rate limiting.
Rate limiting controls the request rate to prevent overload when resources are limited.
Fundamental Concepts
A threshold defines the allowed number of requests per unit time (e.g., 500 QPS). A reject strategy determines how to handle excess requests, such as immediate rejection or queuing.
Rate limiting can be single‑machine or distributed. Single‑machine algorithms include fixed window, sliding window, leaky bucket, and token bucket.
Basic Algorithms
1. Fixed Window
Divides time into fixed windows and counts requests within each window.
type FixedWindowLimiter struct {
windowSize time.Duration // window size
maxRequests int // max requests per window
requests int // current count
lastReset int64 // last reset timestamp
resetMutex sync.Mutex // lock for reset
}
func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}
func (l *FixedWindowLimiter) AllowRequest() bool {
l.resetMutex.Lock()
defer l.resetMutex.Unlock()
if time.Now().Unix()-l.lastReset >= int64(l.windowSize.Seconds()) {
l.requests = 0
l.lastReset = time.Now().Unix()
}
if l.requests >= l.maxRequests {
return false
}
l.requests++
return true
}
func main() {
limiter := NewFixedWindowLimiter(1*time.Second, 3)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() {
fmt.Println(now + " request passed")
} else {
fmt.Println(now + " request throttled")
}
time.Sleep(100 * time.Millisecond)
}
}Advantages : Simple to implement, stable for steady traffic, easy rate control.
Disadvantages : Cannot handle burst traffic well, uneven request distribution may cause spikes.
2. Sliding Window
Improves fixed window by sliding the window continuously, providing finer granularity.
type SlidingWindowLimiter struct {
windowSize time.Duration
maxRequests int
requests []time.Time
requestsLock sync.Mutex
}
func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}
func (l *SlidingWindowLimiter) AllowRequest() bool {
l.requestsLock.Lock()
defer l.requestsLock.Unlock()
now := time.Now()
for len(l.requests) > 0 && now.Sub(l.requests[0]) > l.windowSize {
l.requests = l.requests[1:]
}
if len(l.requests) >= l.maxRequests {
return false
}
l.requests = append(l.requests, now)
return true
}
func main() {
limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() {
fmt.Println(now + " request passed")
} else {
fmt.Println(now + " request throttled")
}
time.Sleep(100 * time.Millisecond)
}
}Advantages : Handles bursts more smoothly, higher precision.
Disadvantages : Higher memory consumption, more complex implementation.
3. Leaky Bucket
Models a bucket that leaks at a constant rate; excess requests are dropped.
type LeakyBucket struct {
rate float64 // tokens per second
capacity int // max tokens
water int // current tokens
lastLeakMs int64 // last leak timestamp
}
func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}
func (lb *LeakyBucket) Allow() bool {
now := time.Now().Unix()
elapsed := now - lb.lastLeakMs
leakAmount := int(float64(elapsed) / 1000 * lb.rate)
if leakAmount > 0 {
if leakAmount > lb.water {
lb.water = 0
} else {
lb.water -= leakAmount
}
}
if lb.water > lb.capacity {
lb.water--
return false
}
lb.water++
lb.lastLeakMs = now
return true
}
func main() {
bucket := NewLeakyBucket(3, 4)
for i := 1; i <= 15; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() {
fmt.Printf(now+" request %d passed
", i)
} else {
fmt.Printf(now+" request %d throttled
", i)
}
time.Sleep(200 * time.Millisecond)
}
}Advantages : Smooths burst traffic, simple to implement, prevents overload.
Disadvantages : Limited flexibility for bursts, fixed outflow rate.
4. Token Bucket
Tokens are added at a steady rate; each request consumes a token, allowing bursts up to the bucket capacity.
type TokenBucket struct {
rate float64
capacity float64
tokens float64
lastUpdate time.Time
mu sync.Mutex
}
func NewTokenBucket(rate, capacity float64) *TokenBucket {
return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastUpdate).Seconds()
tb.tokens += elapsed * tb.rate
if tb.tokens > tb.capacity {
tb.tokens = tb.capacity
}
if tb.tokens >= 1.0 {
tb.tokens--
tb.lastUpdate = now
return true
}
return false
}
func main() {
bucket := NewTokenBucket(2.0, 3.0)
for i := 1; i <= 10; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() {
fmt.Printf(now+" request %d passed
", i)
} else {
fmt.Printf(now+" request %d throttled
", i)
}
time.Sleep(200 * time.Millisecond)
}
}Advantages : Handles bursts, flexible rate adjustment, adapts to varying traffic.
Disadvantages : Slightly more complex implementation, requires precise time handling.
Distributed Rate Limiting
1. Centralized (Redis) Token Bucket
Use a Redis instance to store token bucket state and execute a Lua script atomically.
type Client struct {
client RedisClient // Redis operations
script string // Lua script
}
func NewBucketClient(redis RedisClient) *Client {
return &Client{client: redis, script: `
-- Token bucket script
local bucket = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokenRate = tonumber(ARGV[2])
local redisTime = redis.call('TIME')
local now = tonumber(redisTime[1])
local tokens, lastRefill = unpack(redis.call('hmget', bucket, 'tokens', 'lastRefill'))
tokens = tonumber(tokens)
lastRefill = tonumber(lastRefill)
if not tokens or not lastRefill then
tokens = capacity
lastRefill = now
else
local intervalsSinceLast = (now - lastRefill) * tokenRate
tokens = math.min(capacity, tokens + intervalsSinceLast)
end
if tokens < 1 then
return 0
else
redis.call('hmset', bucket, 'tokens', tokens - 1, 'lastRefill', now)
return 1
end
`}
}
func (c *Client) isAllowed(ctx context.Context, key string, capacity, tokenRate int64) (bool, error) {
result, err := redis.Int(c.client.Do(ctx, "eval", c.script, 1, key, capacity, tokenRate))
if err != nil {
return false, err
}
return result == 1, nil
}
func main() {
c := NewBucketClient(redis.GetPoolByName("redis://127.0.0.1:6379"))
var wg sync.WaitGroup
wg.Add(120)
var count atomic.Int64
for i := 0; i < 120; i++ {
go func(i int) {
defer wg.Done()
ok, _ := c.isAllowed(context.Background(), "test", 100, 10)
if ok {
count.Add(1)
}
fmt.Printf("go %d status:%v
", i, ok)
}(i)
}
wg.Wait()
fmt.Printf("allowed %d
", count.Load())
}Issues : Redis can become a performance bottleneck or single point of failure; network bandwidth and latency affect throughput.
2. Load‑Balancer Based Approach
Distribute requests evenly with a load balancer, then apply local token‑bucket limiting on each instance. Dynamically adjust per‑instance thresholds based on CPU/memory usage.
Issues : Requires reliable local cache, may suffer from uneven global limiting precision, and the load balancer itself can be a single point of failure.
3. Coordination‑Service Approach (ZooKeeper/etcd)
Store the token count in a coordination service. Each instance acquires a lock to decrement the count for a request and increments it after processing. A background task replenishes tokens.
Issues : Implementation complexity and high demand on the coordination service; if the service cannot handle the request volume, it becomes a bottleneck.
Conclusion
There is no universally best rate‑limiting solution; the optimal choice depends on system requirements, existing technology stack, load characteristics, and underlying infrastructure performance. Understanding each algorithm’s behavior and trade‑offs enables architects to combine rate limiting with other techniques such as caching, degradation, load balancing, and asynchronous processing to build resilient, high‑performance services.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.