Backend Development 28 min read

Rate Limiting: Purpose, Algorithms, Implementation Methods, Strategies, and Considerations

Rate limiting safeguards system stability by capping request rates, employing algorithms such as fixed‑window, sliding‑window, leaky‑bucket, and token‑bucket, and can be applied at application, proxy, or hardware layers while using strategies like threshold setting, request classification, feedback, and ensuring fairness, flexibility, and transparency.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Rate Limiting: Purpose, Algorithms, Implementation Methods, Strategies, and Considerations

In software architecture, rate limiting is a crucial mechanism for controlling resource usage and protecting system stability. It limits the number of requests processed within a certain time window to prevent overload.

1. Purpose of Rate Limiting

Prevent system overload: Ensure stable operation under high load.

Guarantee service quality: Provide fair service to all users and avoid resource monopolization.

2. Rate‑Limiting Algorithms

2.1 Fixed‑Window Counter

Counts requests in a fixed time window (e.g., per minute) and resets the counter when the window expires.

package main

import (
    "fmt"
    "sync"
    "time"
)

type FixedWindowCounter struct {
    mu       sync.Mutex
    count    int
    limit    int
    window   time.Time
    duration time.Duration
}

func NewFixedWindowCounter(limit int, duration time.Duration) *FixedWindowCounter {
    return &FixedWindowCounter{limit: limit, window: time.Now(), duration: duration}
}

func (f *FixedWindowCounter) Allow() bool {
    f.mu.Lock()
    defer f.mu.Unlock()
    now := time.Now()
    if now.After(f.window.Add(f.duration)) {
        f.count = 0
        f.window = now
    }
    if f.count < f.limit {
        f.count++
        return true
    }
    return false
}

func main() {
    limiter := NewFixedWindowCounter(10, time.Minute)
    for i := 0; i < 15; i++ {
        if limiter.Allow() {
            fmt.Println("Request", i+1, "allowed")
        } else {
            fmt.Println("Request", i+1, "rejected")
        }
    }
}

Advantages: Simple, intuitive, guarantees a hard limit per window.

Disadvantages: Can cause traffic spikes at window boundaries; not smooth for bursty traffic.

2.2 Sliding Window

Improves fixed‑window by sliding the window over time, smoothing request flow.

package main

import (
    "fmt"
    "sync"
    "time"
)

type SlidingWindowLimiter struct {
    mutex          sync.Mutex
    counters       []int
    limit          int
    windowStart    time.Time
    windowDuration time.Duration
    interval       time.Duration
}

func NewSlidingWindowLimiter(limit int, windowDuration, interval time.Duration) *SlidingWindowLimiter {
    buckets := int(windowDuration / interval)
    return &SlidingWindowLimiter{counters: make([]int, buckets), limit: limit, windowStart: time.Now(), windowDuration: windowDuration, interval: interval}
}

func (s *SlidingWindowLimiter) Allow() bool {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    if time.Since(s.windowStart) > s.windowDuration {
        s.slideWindow()
    }
    now := time.Now()
    index := int((now.UnixNano()-s.windowStart.UnixNano())/s.interval.Nanoseconds()) % len(s.counters)
    if s.counters[index] < s.limit {
        s.counters[index]++
        return true
    }
    return false
}

func (s *SlidingWindowLimiter) slideWindow() {
    copy(s.counters, s.counters[1:])
    s.counters[len(s.counters)-1] = 0
    s.windowStart = time.Now()
}

func main() {
    limiter := NewSlidingWindowLimiter(1, time.Second, 10*time.Millisecond)
    for i := 0; i < 100; i++ {
        if limiter.Allow() {
            fmt.Println("Request", i+1, "allowed")
        } else {
            fmt.Println("Request", i+1, "rejected")
        }
    }
}

Advantages: Smooths traffic, reduces instantaneous peaks.

Disadvantages: More complex, higher memory and CPU cost.

2.3 Leaky Bucket

Models a bucket that drains at a constant rate; excess requests are dropped.

package main

import (
    "fmt"
    "time"
)

type LeakyBucket struct {
    queue chan struct{}
}

func NewLeakyBucket(capacity int) *LeakyBucket {
    return &LeakyBucket{queue: make(chan struct{}, capacity)}
}

func (lb *LeakyBucket) push() bool {
    select {
    case lb.queue <- struct{}{}:
        return true
    default:
        return false
    }
}

func (lb *LeakyBucket) process() {
    for range lb.queue {
        fmt.Println("Request processed at", time.Now().Format("2006-01-02 15:04:05"))
        time.Sleep(100 * time.Millisecond)
    }
}

func main() {
    lb := NewLeakyBucket(5)
    go lb.process()
    for i := 0; i < 10; i++ {
        if lb.push() {
            fmt.Printf("Request %d accepted at %v\n", i+1, time.Now())
        } else {
            fmt.Printf("Request %d rejected at %v\n", i+1, time.Now())
        }
    }
    time.Sleep(2 * time.Second)
}

Advantages: Guarantees a fixed processing rate, smoothes bursts.

Disadvantages: Less flexible for sudden spikes, may increase latency.

2.4 Token Bucket

Allows bursty traffic while maintaining an average rate by storing tokens.

package main

import (
    "fmt"
    "sync"
    "time"
)

type TokenBucket struct {
    mu         sync.Mutex
    capacity   int
    tokens     int
    refillRate float64
    lastRefill time.Time
}

func NewTokenBucket(capacity int, refillRate float64) *TokenBucket {
    return &TokenBucket{capacity: capacity, tokens: capacity, refillRate: refillRate, lastRefill: time.Now()}
}

func (t *TokenBucket) Allow() bool {
    t.mu.Lock()
    defer t.mu.Unlock()
    now := time.Now()
    elapsed := now.Sub(t.lastRefill).Seconds()
    t.tokens += int(t.refillRate * elapsed)
    if t.tokens > t.capacity {
        t.tokens = t.capacity
    }
    if t.tokens > 0 {
        t.tokens--
        t.lastRefill = now
        return true
    }
    return false
}

func main() {
    limiter := NewTokenBucket(10, 2)
    for i := 0; i < 15; i++ {
        if limiter.Allow() {
            fmt.Println("Request", i+1, "allowed")
        } else {
            fmt.Println("Request", i+1, "rejected")
        }
    }
}

Advantages: Supports bursts, flexible, smooth rate control.

Disadvantages: Slightly more complex, requires time‑based state management.

3. Implementation Approaches

3.1 Application‑Layer Rate Limiting – Implemented directly in application code, often via middleware. Example using Gin:

type TokenBucket struct {
    mu        sync.Mutex
    capacity  int
    tokens    int
    refillRate float64
    lastRefill time.Time
}

func NewTokenBucket(capacity int, refillRate float64) *TokenBucket { /* … */ }

func (tb *TokenBucket) Allow() bool { /* … */ }

func Middleware(tb *TokenBucket) gin.HandlerFunc {
    return func(c *gin.Context) {
        if !tb.Allow() {
            c.JSON(http.StatusTooManyRequests, gin.H{"error": "too many requests"})
            c.Abort()
            return
        }
        c.Next()
    }
}

func main() {
    r := gin.Default()
    tb := NewTokenBucket(10, 1.0)
    r.Use(Middleware(tb))
    r.GET("/hello", func(c *gin.Context) { c.JSON(http.StatusOK, gin.H{"message": "hello world"}) })
    r.Run()
}

3.2 Proxy‑Layer Rate Limiting – Performed by reverse proxies such as Nginx or HAProxy before traffic reaches backend services.

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=1r/s;
    server {
        listen 80;
        location /api/ {
            limit_req zone=mylimit burst=5 nodelay;
            proxy_pass http://backend/;
        }
    }
}

3.3 Hardware‑Layer Rate Limiting – Implemented on load balancers or dedicated network devices to filter traffic at the infrastructure level.

4. Rate‑Limiting Strategies

4.1 Threshold Setting – Define the maximum number of requests per time unit. Example pseudo‑code shows a RateLimiterV2 with configurable capacity, refill rate, and hard limit.

type RateLimiterV2 struct {
    mu       sync.Mutex
    tokens   int
    capacity int
    refillRate float64
    limit   int
}

func NewRateLimiterV2(capacity int, refillRate float64, limit int) *RateLimiterV2 { /* … */ }

func (r *RateLimiterV2) Allow() bool {
    r.mu.Lock()
    defer r.mu.Unlock()
    // token‑bucket logic …
    if r.tokens >= r.limit {
        return false
    }
    return true
}

4.2 Request Classification – Apply different limits to different API endpoints or user groups. Example maps routes to individual RateLimiterV2 instances.

var RouteLimiterMap = map[string]*RateLimiterV2{}

func SetRateLimiterForRoute(route string, capacity, limit int, refillRate float64) {
    RouteLimiterMap[route] = NewRateLimiterV2(capacity, refillRate, limit)
}

func MiddlewareWithRoute(route string) gin.HandlerFunc {
    return func(c *gin.Context) {
        if !RouteLimiterMap[route].Allow() {
            c.JSON(http.StatusTooManyRequests, gin.H{"error": "too many requests"})
            c.Abort()
            return
        }
        c.Next()
    }
}

4.3 Feedback Mechanism – Return informative error messages or retry‑after headers when a request is throttled. Example:

func (r *RateLimiterV2) AllowWithFeedback() (bool, string) {
    r.mu.Lock()
    defer r.mu.Unlock()
    if r.tokens >= r.limit {
        return false, "Too many requests. Please try again later."
    }
    r.tokens--
    return true, ""
}

5. Considerations for Designing Rate Limiting

5.1 Fairness – Ensure all users receive equitable access. A FairLimiter maintains a separate limiter per user/IP.

type FairLimiter struct {
    sync.Mutex
    limits map[string]*RateLimiterV2
}

func (f *FairLimiter) Allow(userID string) (bool, string) {
    f.Lock()
    defer f.Unlock()
    if _, ok := f.limits[userID]; !ok {
        f.limits[userID] = NewRateLimiterV2(capacity, refillRate, limit)
    }
    return f.limits[userID].AllowWithFeedback()
}

5.2 Flexibility – Ability to adjust limits at runtime. A FlexibleLimiter can change capacity, refill rate, and limit on the fly.

type FlexibleLimiter struct {
    sync.Mutex
    capacity   int
    refillRate float64
    limit      int
}

func (f *FlexibleLimiter) SetParams(capacity int, refillRate float64, limit int) {
    f.Lock()
    defer f.Unlock()
    f.capacity, f.refillRate, f.limit = capacity, refillRate, limit
}

func (f *FlexibleLimiter) Allow() (bool, string) {
    rl := NewRateLimiterV2(f.capacity, f.refillRate, f.limit)
    return rl.AllowWithFeedback()
}

5.3 Transparency – Expose current rate‑limit state to clients (e.g., remaining tokens). Example middleware adds an HTTP header with remaining tokens.

func MiddlewareWithTransparency(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        allowed, msg, tokens := transparentLimiter.AllowWithStatus()
        if !allowed {
            w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", tokens))
            w.WriteHeader(http.StatusTooManyRequests)
            fmt.Fprintln(w, msg)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Overall, proper rate limiting protects services from overload, improves stability, and provides a better user experience while maintaining fairness and flexibility across different deployment layers.

backenddistributed systemsalgorithmgolangtraffic controlRate Limiting
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.