Cloud Native 31 min read

Rate Limiting: Concepts, Algorithms, and Distributed Solutions

Rate limiting protects micro‑service stability by rejecting excess traffic, using algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and can be deployed locally or distributed via Redis, load‑balancers, or coordination services, each offering different trade‑offs in precision, scalability, and complexity.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Rate Limiting: Concepts, Algorithms, and Distributed Solutions

With the popularity of micro‑services, the dependency and call relationships between services become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request time‑outs, server overload, or even service outages. To protect both the system itself and its upstream/downstream services, rate limiting is commonly applied to quickly reject requests that exceed configured thresholds, thereby ensuring stability and performance.

Key design principles for a good rate‑limiting solution include multi‑level limiting, dynamic threshold adjustment, flexible dimensions (e.g., per‑IP, per‑user, per‑device), decoupling from business logic, fault tolerance, and real‑time monitoring and alerting.

1. Background

High‑concurrency services are typically protected by three mechanisms: caching, degradation, and rate limiting. Rate limiting controls the request rate to prevent overload.

2. Overview of Rate‑Limiting Concepts

A rate‑limiting system defines a threshold (e.g., 500 QPS) and a rejection strategy (e.g., immediate reject or queue). Depending on the scope, rate limiting can be single‑machine or distributed . The four classic algorithms are:

Fixed‑window

Sliding‑window

Leaky‑bucket

Token‑bucket

3. Basic Algorithms

3.1 Fixed‑Window Limiting

The time axis is divided into fixed‑size windows (e.g., 1 second). A counter records the number of requests in the current window; if the counter exceeds the limit, the request is rejected. The counter resets when the window ends.

type FixedWindowLimiter struct {
windowSize  time.Duration // window size
maxRequests int           // max requests per window
requests    int           // current count
lastReset   int64         // last reset timestamp (seconds)
resetMutex  sync.Mutex    // protects reset
}
func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}
func (l *FixedWindowLimiter) AllowRequest() bool {
l.resetMutex.Lock(); defer l.resetMutex.Unlock()
if time.Now().Unix()-l.lastReset >= int64(l.windowSize.Seconds()) {
l.requests = 0
l.lastReset = time.Now().Unix()
}
if l.requests >= l.maxRequests { return false }
l.requests++
return true
}
func main() {
limiter := NewFixedWindowLimiter(1*time.Second, 3)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now+" 请求通过") } else { fmt.Println(now+" 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}

Advantages : simple to implement, stable for steady traffic, easy rate control.

Disadvantages : cannot handle short‑term bursts, uneven request distribution may cause spikes.

3.2 Sliding‑Window Limiting

The sliding window moves continuously, discarding expired timestamps to keep the request count within the limit. This provides smoother handling of bursts and finer granularity.

type SlidingWindowLimiter struct {
windowSize   time.Duration // window size
maxRequests  int           // max requests
requests     []time.Time   // timestamps of requests
requestsLock sync.Mutex    // protects the slice
}
func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}
func (l *SlidingWindowLimiter) AllowRequest() bool {
l.requestsLock.Lock(); defer l.requestsLock.Unlock()
now := time.Now()
// drop expired timestamps
for len(l.requests) > 0 && now.Sub(l.requests[0]) > l.windowSize {
l.requests = l.requests[1:]
}
if len(l.requests) >= l.maxRequests { return false }
l.requests = append(l.requests, now)
return true
}
func main() {
limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now+" 请求通过") } else { fmt.Println(now+" 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}

Advantages : flexible, real‑time response to traffic changes, higher precision.

Disadvantages : higher memory consumption, more complex implementation.

3.3 Leaky‑Bucket Limiting

The leaky bucket maintains a fixed‑capacity queue; incoming requests are enqueued, and the bucket leaks at a constant rate. If the bucket is full, new requests are dropped.

type LeakyBucket struct {
rate       float64 // tokens per second
capacity   int     // max tokens
water      int     // current tokens
lastLeakMs int64   // last leak timestamp (seconds)
}
func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}
func (lb *LeakyBucket) Allow() bool {
now := time.Now().Unix()
elapsed := now - lb.lastLeakMs
leak := int(float64(elapsed) * lb.rate)
if leak > 0 {
if leak > lb.water { lb.water = 0 } else { lb.water -= leak }
}
if lb.water >= lb.capacity { return false }
lb.water++
lb.lastLeakMs = now
return true
}
func main() {
bucket := NewLeakyBucket(3, 4)
for i := 1; i <= 15; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() { fmt.Printf(now+" 第 %d 请求通过\n", i) } else { fmt.Printf(now+" 第 %d 请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}

Advantages : smooths burst traffic, simple to implement, effective against overload.

Disadvantages : limited flexibility for sudden bursts, fixed outflow rate, possible token waste.

3.4 Token‑Bucket Limiting

The token bucket generates tokens at a fixed rate and stores them up to a capacity. A request consumes a token; if none are available, the request is rejected. This algorithm allows short bursts while enforcing an average rate.

type TokenBucket struct {
rate       float64   // token generation rate
capacity   float64   // max tokens
tokens     float64   // current tokens
lastUpdate time.Time // last refill time
mu         sync.Mutex
}
func NewTokenBucket(rate, capacity float64) *TokenBucket {
return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock(); defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastUpdate).Seconds()
tb.tokens += elapsed * tb.rate
if tb.tokens > tb.capacity { tb.tokens = tb.capacity }
if tb.tokens >= 1 { tb.tokens--; tb.lastUpdate = now; return true }
return false
}
func main() {
bucket := NewTokenBucket(2.0, 3.0)
for i := 1; i <= 10; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() { fmt.Printf(now+" 第 %d 请求通过\n", i) } else { fmt.Printf(now+" 第 %d 请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}

Advantages : smooths bursts, flexible rate adjustment, suitable for varying traffic patterns.

Disadvantages : slightly more complex implementation, requires precise time handling, possible token waste when traffic is low.

4. Distributed Rate‑Limiting Solutions

4.1 Centralized Redis Token‑Bucket

A Redis instance stores the token bucket state. Each request executes a Lua script that atomically checks and updates the bucket. This provides a global limit across all service instances.

type RedisClient interface { Do(ctx context.Context, cmd string, args ...interface{}) (interface{}, error) }
type Client struct { client RedisClient; script string }
func NewBucketClient(redis RedisClient) *Client {
return &Client{client: redis, script: `-- token bucket script
        local bucket = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local tokenRate = tonumber(ARGV[2])
        local now = tonumber(redis.call('TIME')[1])
        local tokens, lastRefill = unpack(redis.call('HMGET', bucket, 'tokens', 'lastRefill'))
        if not tokens or not lastRefill then tokens, lastRefill = capacity, now end
        local intervals = (now - lastRefill) * tokenRate
        tokens = math.min(capacity, tokens + intervals)
        if tokens < 1 then return 0 else redis.call('HMSET', bucket, 'tokens', tokens-1, 'lastRefill', now); return 1 end`}
}
func (c *Client) isAllowed(ctx context.Context, key string, capacity, tokenRate int64) (bool, error) {
res, err := redis.Int(c.client.Do(ctx, "EVAL", c.script, 1, key, capacity, tokenRate))
if err != nil { return false, err }
return res == 1, nil
}
func main() {
c := NewBucketClient(redis.GetPoolByName("redis://127.0.0.1:6379"))
var wg sync.WaitGroup; wg.Add(120)
var allowed atomic.Int64
for i := 0; i < 120; i++ { go func(i int) { defer wg.Done(); ok, _ := c.isAllowed(context.Background(), "test", 100, 10); if ok { allowed.Add(1) }; fmt.Printf("go %d status:%v\n", i, ok) }(i) }
wg.Wait(); fmt.Printf("allow %d\n", allowed.Load())
}

Problems : Redis can become a performance bottleneck, a single point of failure, and network bandwidth may limit throughput.

4.2 Load‑Balancer Based Distributed Limiting

Requests are evenly distributed by a load balancer (or service discovery). Each instance runs a local token‑bucket limiter. Dynamic adjustments can be made per instance based on CPU/memory usage, reducing the reliance on a central component.

Challenges include maintaining consistent limits across instances, handling load‑balancer failures, and ensuring cache eviction policies are appropriate.

4.3 Coordination‑Service Based Limiting (ZooKeeper / etcd)

Each service acquires a token by creating or updating a node in ZooKeeper/etcd. The node stores the remaining token count and last refill timestamp. Tokens are atomically decremented; when a request finishes, the token is released back.

Advantages: precise global limiting without a single data‑store bottleneck. Disadvantages: higher implementation complexity and heavy reliance on the coordination service’s performance.

5. Summary

There is no universally best rate‑limiting solution; the appropriate choice depends on system requirements, existing technology stack, load characteristics, and underlying infrastructure. Simple single‑machine algorithms (fixed or sliding windows) are easy to adopt for modest traffic. For distributed environments, centralized token buckets (Redis) provide global consistency but introduce a potential bottleneck, while load‑balancer‑based local limiters offer higher scalability at the cost of precision. Coordination‑service‑based approaches give accurate global limits but are more complex.

Rate limiting should be combined with other techniques such as load balancing, caching, asynchronous processing, and horizontal scaling to build robust, high‑performance systems.

distributed systemsalgorithmMicroservicesgolangRate Limitingtoken bucket
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.