Rate Limiting: Concepts, Algorithms, and Distributed Solutions
Rate limiting protects micro‑service stability by rejecting excess traffic, using algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and can be deployed locally or distributed via Redis, load‑balancers, or coordination services, each offering different trade‑offs in precision, scalability, and complexity.
With the popularity of micro‑services, the dependency and call relationships between services become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request time‑outs, server overload, or even service outages. To protect both the system itself and its upstream/downstream services, rate limiting is commonly applied to quickly reject requests that exceed configured thresholds, thereby ensuring stability and performance.
Key design principles for a good rate‑limiting solution include multi‑level limiting, dynamic threshold adjustment, flexible dimensions (e.g., per‑IP, per‑user, per‑device), decoupling from business logic, fault tolerance, and real‑time monitoring and alerting.
1. Background
High‑concurrency services are typically protected by three mechanisms: caching, degradation, and rate limiting. Rate limiting controls the request rate to prevent overload.
2. Overview of Rate‑Limiting Concepts
A rate‑limiting system defines a threshold (e.g., 500 QPS) and a rejection strategy (e.g., immediate reject or queue). Depending on the scope, rate limiting can be single‑machine or distributed . The four classic algorithms are:
Fixed‑window
Sliding‑window
Leaky‑bucket
Token‑bucket
3. Basic Algorithms
3.1 Fixed‑Window Limiting
The time axis is divided into fixed‑size windows (e.g., 1 second). A counter records the number of requests in the current window; if the counter exceeds the limit, the request is rejected. The counter resets when the window ends.
type FixedWindowLimiter struct {
windowSize time.Duration // window size
maxRequests int // max requests per window
requests int // current count
lastReset int64 // last reset timestamp (seconds)
resetMutex sync.Mutex // protects reset
}
func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}
func (l *FixedWindowLimiter) AllowRequest() bool {
l.resetMutex.Lock(); defer l.resetMutex.Unlock()
if time.Now().Unix()-l.lastReset >= int64(l.windowSize.Seconds()) {
l.requests = 0
l.lastReset = time.Now().Unix()
}
if l.requests >= l.maxRequests { return false }
l.requests++
return true
}
func main() {
limiter := NewFixedWindowLimiter(1*time.Second, 3)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now+" 请求通过") } else { fmt.Println(now+" 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}Advantages : simple to implement, stable for steady traffic, easy rate control.
Disadvantages : cannot handle short‑term bursts, uneven request distribution may cause spikes.
3.2 Sliding‑Window Limiting
The sliding window moves continuously, discarding expired timestamps to keep the request count within the limit. This provides smoother handling of bursts and finer granularity.
type SlidingWindowLimiter struct {
windowSize time.Duration // window size
maxRequests int // max requests
requests []time.Time // timestamps of requests
requestsLock sync.Mutex // protects the slice
}
func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}
func (l *SlidingWindowLimiter) AllowRequest() bool {
l.requestsLock.Lock(); defer l.requestsLock.Unlock()
now := time.Now()
// drop expired timestamps
for len(l.requests) > 0 && now.Sub(l.requests[0]) > l.windowSize {
l.requests = l.requests[1:]
}
if len(l.requests) >= l.maxRequests { return false }
l.requests = append(l.requests, now)
return true
}
func main() {
limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now+" 请求通过") } else { fmt.Println(now+" 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}Advantages : flexible, real‑time response to traffic changes, higher precision.
Disadvantages : higher memory consumption, more complex implementation.
3.3 Leaky‑Bucket Limiting
The leaky bucket maintains a fixed‑capacity queue; incoming requests are enqueued, and the bucket leaks at a constant rate. If the bucket is full, new requests are dropped.
type LeakyBucket struct {
rate float64 // tokens per second
capacity int // max tokens
water int // current tokens
lastLeakMs int64 // last leak timestamp (seconds)
}
func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}
func (lb *LeakyBucket) Allow() bool {
now := time.Now().Unix()
elapsed := now - lb.lastLeakMs
leak := int(float64(elapsed) * lb.rate)
if leak > 0 {
if leak > lb.water { lb.water = 0 } else { lb.water -= leak }
}
if lb.water >= lb.capacity { return false }
lb.water++
lb.lastLeakMs = now
return true
}
func main() {
bucket := NewLeakyBucket(3, 4)
for i := 1; i <= 15; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() { fmt.Printf(now+" 第 %d 请求通过\n", i) } else { fmt.Printf(now+" 第 %d 请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}Advantages : smooths burst traffic, simple to implement, effective against overload.
Disadvantages : limited flexibility for sudden bursts, fixed outflow rate, possible token waste.
3.4 Token‑Bucket Limiting
The token bucket generates tokens at a fixed rate and stores them up to a capacity. A request consumes a token; if none are available, the request is rejected. This algorithm allows short bursts while enforcing an average rate.
type TokenBucket struct {
rate float64 // token generation rate
capacity float64 // max tokens
tokens float64 // current tokens
lastUpdate time.Time // last refill time
mu sync.Mutex
}
func NewTokenBucket(rate, capacity float64) *TokenBucket {
return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock(); defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastUpdate).Seconds()
tb.tokens += elapsed * tb.rate
if tb.tokens > tb.capacity { tb.tokens = tb.capacity }
if tb.tokens >= 1 { tb.tokens--; tb.lastUpdate = now; return true }
return false
}
func main() {
bucket := NewTokenBucket(2.0, 3.0)
for i := 1; i <= 10; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() { fmt.Printf(now+" 第 %d 请求通过\n", i) } else { fmt.Printf(now+" 第 %d 请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}Advantages : smooths bursts, flexible rate adjustment, suitable for varying traffic patterns.
Disadvantages : slightly more complex implementation, requires precise time handling, possible token waste when traffic is low.
4. Distributed Rate‑Limiting Solutions
4.1 Centralized Redis Token‑Bucket
A Redis instance stores the token bucket state. Each request executes a Lua script that atomically checks and updates the bucket. This provides a global limit across all service instances.
type RedisClient interface { Do(ctx context.Context, cmd string, args ...interface{}) (interface{}, error) }
type Client struct { client RedisClient; script string }
func NewBucketClient(redis RedisClient) *Client {
return &Client{client: redis, script: `-- token bucket script
local bucket = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokenRate = tonumber(ARGV[2])
local now = tonumber(redis.call('TIME')[1])
local tokens, lastRefill = unpack(redis.call('HMGET', bucket, 'tokens', 'lastRefill'))
if not tokens or not lastRefill then tokens, lastRefill = capacity, now end
local intervals = (now - lastRefill) * tokenRate
tokens = math.min(capacity, tokens + intervals)
if tokens < 1 then return 0 else redis.call('HMSET', bucket, 'tokens', tokens-1, 'lastRefill', now); return 1 end`}
}
func (c *Client) isAllowed(ctx context.Context, key string, capacity, tokenRate int64) (bool, error) {
res, err := redis.Int(c.client.Do(ctx, "EVAL", c.script, 1, key, capacity, tokenRate))
if err != nil { return false, err }
return res == 1, nil
}
func main() {
c := NewBucketClient(redis.GetPoolByName("redis://127.0.0.1:6379"))
var wg sync.WaitGroup; wg.Add(120)
var allowed atomic.Int64
for i := 0; i < 120; i++ { go func(i int) { defer wg.Done(); ok, _ := c.isAllowed(context.Background(), "test", 100, 10); if ok { allowed.Add(1) }; fmt.Printf("go %d status:%v\n", i, ok) }(i) }
wg.Wait(); fmt.Printf("allow %d\n", allowed.Load())
}Problems : Redis can become a performance bottleneck, a single point of failure, and network bandwidth may limit throughput.
4.2 Load‑Balancer Based Distributed Limiting
Requests are evenly distributed by a load balancer (or service discovery). Each instance runs a local token‑bucket limiter. Dynamic adjustments can be made per instance based on CPU/memory usage, reducing the reliance on a central component.
Challenges include maintaining consistent limits across instances, handling load‑balancer failures, and ensuring cache eviction policies are appropriate.
4.3 Coordination‑Service Based Limiting (ZooKeeper / etcd)
Each service acquires a token by creating or updating a node in ZooKeeper/etcd. The node stores the remaining token count and last refill timestamp. Tokens are atomically decremented; when a request finishes, the token is released back.
Advantages: precise global limiting without a single data‑store bottleneck. Disadvantages: higher implementation complexity and heavy reliance on the coordination service’s performance.
5. Summary
There is no universally best rate‑limiting solution; the appropriate choice depends on system requirements, existing technology stack, load characteristics, and underlying infrastructure. Simple single‑machine algorithms (fixed or sliding windows) are easy to adopt for modest traffic. For distributed environments, centralized token buckets (Redis) provide global consistency but introduce a potential bottleneck, while load‑balancer‑based local limiters offer higher scalability at the cost of precision. Coordination‑service‑based approaches give accurate global limits but are more complex.
Rate limiting should be combined with other techniques such as load balancing, caching, asynchronous processing, and horizontal scaling to build robust, high‑performance systems.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.