Rate Limiting: Purpose, Algorithms, Implementation Methods, Strategies, and Considerations
Rate limiting safeguards system stability by capping request rates, employing algorithms such as fixed‑window, sliding‑window, leaky‑bucket, and token‑bucket, and can be applied at application, proxy, or hardware layers while using strategies like threshold setting, request classification, feedback, and ensuring fairness, flexibility, and transparency.
In software architecture, rate limiting is a crucial mechanism for controlling resource usage and protecting system stability. It limits the number of requests processed within a certain time window to prevent overload.
1. Purpose of Rate Limiting
Prevent system overload: Ensure stable operation under high load.
Guarantee service quality: Provide fair service to all users and avoid resource monopolization.
2. Rate‑Limiting Algorithms
2.1 Fixed‑Window Counter
Counts requests in a fixed time window (e.g., per minute) and resets the counter when the window expires.
package main
import (
"fmt"
"sync"
"time"
)
type FixedWindowCounter struct {
mu sync.Mutex
count int
limit int
window time.Time
duration time.Duration
}
func NewFixedWindowCounter(limit int, duration time.Duration) *FixedWindowCounter {
return &FixedWindowCounter{limit: limit, window: time.Now(), duration: duration}
}
func (f *FixedWindowCounter) Allow() bool {
f.mu.Lock()
defer f.mu.Unlock()
now := time.Now()
if now.After(f.window.Add(f.duration)) {
f.count = 0
f.window = now
}
if f.count < f.limit {
f.count++
return true
}
return false
}
func main() {
limiter := NewFixedWindowCounter(10, time.Minute)
for i := 0; i < 15; i++ {
if limiter.Allow() {
fmt.Println("Request", i+1, "allowed")
} else {
fmt.Println("Request", i+1, "rejected")
}
}
}Advantages: Simple, intuitive, guarantees a hard limit per window.
Disadvantages: Can cause traffic spikes at window boundaries; not smooth for bursty traffic.
2.2 Sliding Window
Improves fixed‑window by sliding the window over time, smoothing request flow.
package main
import (
"fmt"
"sync"
"time"
)
type SlidingWindowLimiter struct {
mutex sync.Mutex
counters []int
limit int
windowStart time.Time
windowDuration time.Duration
interval time.Duration
}
func NewSlidingWindowLimiter(limit int, windowDuration, interval time.Duration) *SlidingWindowLimiter {
buckets := int(windowDuration / interval)
return &SlidingWindowLimiter{counters: make([]int, buckets), limit: limit, windowStart: time.Now(), windowDuration: windowDuration, interval: interval}
}
func (s *SlidingWindowLimiter) Allow() bool {
s.mutex.Lock()
defer s.mutex.Unlock()
if time.Since(s.windowStart) > s.windowDuration {
s.slideWindow()
}
now := time.Now()
index := int((now.UnixNano()-s.windowStart.UnixNano())/s.interval.Nanoseconds()) % len(s.counters)
if s.counters[index] < s.limit {
s.counters[index]++
return true
}
return false
}
func (s *SlidingWindowLimiter) slideWindow() {
copy(s.counters, s.counters[1:])
s.counters[len(s.counters)-1] = 0
s.windowStart = time.Now()
}
func main() {
limiter := NewSlidingWindowLimiter(1, time.Second, 10*time.Millisecond)
for i := 0; i < 100; i++ {
if limiter.Allow() {
fmt.Println("Request", i+1, "allowed")
} else {
fmt.Println("Request", i+1, "rejected")
}
}
}Advantages: Smooths traffic, reduces instantaneous peaks.
Disadvantages: More complex, higher memory and CPU cost.
2.3 Leaky Bucket
Models a bucket that drains at a constant rate; excess requests are dropped.
package main
import (
"fmt"
"time"
)
type LeakyBucket struct {
queue chan struct{}
}
func NewLeakyBucket(capacity int) *LeakyBucket {
return &LeakyBucket{queue: make(chan struct{}, capacity)}
}
func (lb *LeakyBucket) push() bool {
select {
case lb.queue <- struct{}{}:
return true
default:
return false
}
}
func (lb *LeakyBucket) process() {
for range lb.queue {
fmt.Println("Request processed at", time.Now().Format("2006-01-02 15:04:05"))
time.Sleep(100 * time.Millisecond)
}
}
func main() {
lb := NewLeakyBucket(5)
go lb.process()
for i := 0; i < 10; i++ {
if lb.push() {
fmt.Printf("Request %d accepted at %v\n", i+1, time.Now())
} else {
fmt.Printf("Request %d rejected at %v\n", i+1, time.Now())
}
}
time.Sleep(2 * time.Second)
}Advantages: Guarantees a fixed processing rate, smoothes bursts.
Disadvantages: Less flexible for sudden spikes, may increase latency.
2.4 Token Bucket
Allows bursty traffic while maintaining an average rate by storing tokens.
package main
import (
"fmt"
"sync"
"time"
)
type TokenBucket struct {
mu sync.Mutex
capacity int
tokens int
refillRate float64
lastRefill time.Time
}
func NewTokenBucket(capacity int, refillRate float64) *TokenBucket {
return &TokenBucket{capacity: capacity, tokens: capacity, refillRate: refillRate, lastRefill: time.Now()}
}
func (t *TokenBucket) Allow() bool {
t.mu.Lock()
defer t.mu.Unlock()
now := time.Now()
elapsed := now.Sub(t.lastRefill).Seconds()
t.tokens += int(t.refillRate * elapsed)
if t.tokens > t.capacity {
t.tokens = t.capacity
}
if t.tokens > 0 {
t.tokens--
t.lastRefill = now
return true
}
return false
}
func main() {
limiter := NewTokenBucket(10, 2)
for i := 0; i < 15; i++ {
if limiter.Allow() {
fmt.Println("Request", i+1, "allowed")
} else {
fmt.Println("Request", i+1, "rejected")
}
}
}Advantages: Supports bursts, flexible, smooth rate control.
Disadvantages: Slightly more complex, requires time‑based state management.
3. Implementation Approaches
3.1 Application‑Layer Rate Limiting – Implemented directly in application code, often via middleware. Example using Gin:
type TokenBucket struct {
mu sync.Mutex
capacity int
tokens int
refillRate float64
lastRefill time.Time
}
func NewTokenBucket(capacity int, refillRate float64) *TokenBucket { /* … */ }
func (tb *TokenBucket) Allow() bool { /* … */ }
func Middleware(tb *TokenBucket) gin.HandlerFunc {
return func(c *gin.Context) {
if !tb.Allow() {
c.JSON(http.StatusTooManyRequests, gin.H{"error": "too many requests"})
c.Abort()
return
}
c.Next()
}
}
func main() {
r := gin.Default()
tb := NewTokenBucket(10, 1.0)
r.Use(Middleware(tb))
r.GET("/hello", func(c *gin.Context) { c.JSON(http.StatusOK, gin.H{"message": "hello world"}) })
r.Run()
}3.2 Proxy‑Layer Rate Limiting – Performed by reverse proxies such as Nginx or HAProxy before traffic reaches backend services.
http {
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=1r/s;
server {
listen 80;
location /api/ {
limit_req zone=mylimit burst=5 nodelay;
proxy_pass http://backend/;
}
}
}3.3 Hardware‑Layer Rate Limiting – Implemented on load balancers or dedicated network devices to filter traffic at the infrastructure level.
4. Rate‑Limiting Strategies
4.1 Threshold Setting – Define the maximum number of requests per time unit. Example pseudo‑code shows a RateLimiterV2 with configurable capacity, refill rate, and hard limit.
type RateLimiterV2 struct {
mu sync.Mutex
tokens int
capacity int
refillRate float64
limit int
}
func NewRateLimiterV2(capacity int, refillRate float64, limit int) *RateLimiterV2 { /* … */ }
func (r *RateLimiterV2) Allow() bool {
r.mu.Lock()
defer r.mu.Unlock()
// token‑bucket logic …
if r.tokens >= r.limit {
return false
}
return true
}4.2 Request Classification – Apply different limits to different API endpoints or user groups. Example maps routes to individual RateLimiterV2 instances.
var RouteLimiterMap = map[string]*RateLimiterV2{}
func SetRateLimiterForRoute(route string, capacity, limit int, refillRate float64) {
RouteLimiterMap[route] = NewRateLimiterV2(capacity, refillRate, limit)
}
func MiddlewareWithRoute(route string) gin.HandlerFunc {
return func(c *gin.Context) {
if !RouteLimiterMap[route].Allow() {
c.JSON(http.StatusTooManyRequests, gin.H{"error": "too many requests"})
c.Abort()
return
}
c.Next()
}
}4.3 Feedback Mechanism – Return informative error messages or retry‑after headers when a request is throttled. Example:
func (r *RateLimiterV2) AllowWithFeedback() (bool, string) {
r.mu.Lock()
defer r.mu.Unlock()
if r.tokens >= r.limit {
return false, "Too many requests. Please try again later."
}
r.tokens--
return true, ""
}5. Considerations for Designing Rate Limiting
5.1 Fairness – Ensure all users receive equitable access. A FairLimiter maintains a separate limiter per user/IP.
type FairLimiter struct {
sync.Mutex
limits map[string]*RateLimiterV2
}
func (f *FairLimiter) Allow(userID string) (bool, string) {
f.Lock()
defer f.Unlock()
if _, ok := f.limits[userID]; !ok {
f.limits[userID] = NewRateLimiterV2(capacity, refillRate, limit)
}
return f.limits[userID].AllowWithFeedback()
}5.2 Flexibility – Ability to adjust limits at runtime. A FlexibleLimiter can change capacity, refill rate, and limit on the fly.
type FlexibleLimiter struct {
sync.Mutex
capacity int
refillRate float64
limit int
}
func (f *FlexibleLimiter) SetParams(capacity int, refillRate float64, limit int) {
f.Lock()
defer f.Unlock()
f.capacity, f.refillRate, f.limit = capacity, refillRate, limit
}
func (f *FlexibleLimiter) Allow() (bool, string) {
rl := NewRateLimiterV2(f.capacity, f.refillRate, f.limit)
return rl.AllowWithFeedback()
}5.3 Transparency – Expose current rate‑limit state to clients (e.g., remaining tokens). Example middleware adds an HTTP header with remaining tokens.
func MiddlewareWithTransparency(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
allowed, msg, tokens := transparentLimiter.AllowWithStatus()
if !allowed {
w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", tokens))
w.WriteHeader(http.StatusTooManyRequests)
fmt.Fprintln(w, msg)
return
}
next.ServeHTTP(w, r)
})
}Overall, proper rate limiting protects services from overload, improves stability, and provides a better user experience while maintaining fairness and flexibility across different deployment layers.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.