An Introduction to Rate Limiting: Concepts, Classifications, Algorithms, and Go Implementation
This article introduces rate limiting, explains its purpose and classifications, compares fixed and sliding windows, describes common algorithms such as token bucket, leaky bucket, and counter, and provides detailed Go code examples using the golang.org/x/time/rate library for practical implementation.
Rate Limiting Overview
Rate limiting (Rate Limit) is a protection mechanism for high‑availability systems that allows only a specified number of events to enter a service. Requests exceeding the limit are rejected, queued, or degraded, ensuring that a portion of traffic can be served normally and preventing system avalanche.
Why Rate Limiting Is Needed
It is not only a response to an overloaded service; it is a self‑protective measure against resource scarcity and security risks. By limiting traffic, a system can provide maximal service capacity with limited resources, serving expected traffic while rejecting or delaying excess requests.
Standard Support
HTTP RFC 6585 defines status code 429 Too Many Requests with a Retry-After header to indicate when the client may retry.
Rate Limiting Classification
Rate limiting can be classified by granularity, object type, and algorithm.
Granularity
Single‑machine (single service node) limiting
Distributed limiting (multiple nodes, often using a central store such as Redis)
Distributed limiting requires handling data consistency, time synchronization, timeout handling, and performance/reliability of the central store.
Object Type
Request‑based limiting (e.g., QPS, total request count)
Resource‑based limiting (e.g., TCP connections, thread count, memory usage)
Algorithm Classification
Counter (fixed‑window)
Sliding‑window counter
Leaky bucket
Token bucket
Fixed‑Window Counter
The simplest algorithm maintains a counter for a fixed time window. When the window expires, the counter resets.
package limit
import (
"sync/atomic"
"time"
)
type Counter struct {
Count uint64 // current count
Limit uint64 // max requests per window
Interval int64 // window size in ms
RefreshTime int64 // window start timestamp
}
func NewCounter(count, limit uint64, interval, rt int64) *Counter {
return &Counter{Count: count, Limit: limit, Interval: interval, RefreshTime: rt}
}
func (c *Counter) RateLimit() bool {
now := time.Now().UnixNano() / 1e6
if now < c.RefreshTime+c.Interval {
atomic.AddUint64(&c.Count, 1)
return c.Count <= c.Limit
} else {
c.RefreshTime = now
atomic.AddUint64(&c.Count, -c.Count)
return true
}
}Drawbacks: coarse granularity leads to burst‑sensitivity and can double the effective QPS in edge cases.
Sliding‑Window Counter
The window is divided into many small sub‑intervals, each with its own counter. The window slides forward, discarding the oldest sub‑interval and adding a new one, providing smoother rate control at the cost of higher memory usage.
Leaky Bucket
Requests enter a fixed‑size queue (the bucket) and are released at a constant rate. Excess requests overflow and are dropped, smoothing traffic but introducing latency for bursts.
Token Bucket
Tokens are added to a bucket at a constant rate; a request consumes a token. If the bucket is empty, the request is rejected. This algorithm allows bursts while enforcing an average rate.
Go Rate‑Limiting Library (golang.org/x/time/rate)
The Go standard library provides a token‑bucket implementation via rate.NewLimiter . Example:
import "golang.org/x/time/rate"
limiter := rate.NewLimiter(10, 5) // 10 events/sec, burst up to 5Key methods:
Allow/AllowN – non‑blocking check, returns false if the request would exceed the limit.
Wait/WaitN – blocks until enough tokens are available or the context expires.
Reserve/ReserveN – returns a Reservation that can be inspected, delayed, or cancelled.
Example of Allow usage:
func AllowDemo() {
limiter := rate.NewLimiter(rate.Every(200*time.Millisecond), 5)
for i := 0; i < 15; i++ {
if limiter.Allow() {
fmt.Println(i, "allowed", time.Now())
} else {
fmt.Println(i, "blocked", time.Now())
}
time.Sleep(80 * time.Millisecond)
}
}Example of Wait usage with timeout:
func WaitNDemo() {
limiter := rate.NewLimiter(10, 5)
for i := 0; i < 10; i++ {
ctx, cancel := context.WithTimeout(context.Background(), 400*time.Millisecond)
defer cancel()
if err := limiter.WaitN(ctx, 4); err != nil {
fmt.Println("wait error:", err)
continue
}
fmt.Println(i, "executed", time.Now())
}
}Example of Reserve usage to obtain the exact delay before execution:
func ReserveNDemo() {
limiter := rate.NewLimiter(10, 5)
for i := 0; i < 10; i++ {
r := limiter.ReserveN(time.Now(), 4)
if !r.OK() { return }
delay := r.Delay()
time.Sleep(delay)
fmt.Println("executed", time.Now(), delay)
}
}The limiter also supports dynamic adjustment of burst size and rate via SetBurst and SetLimit methods.
Conclusion
Rate limiting is a fundamental component of service governance. Understanding its classifications, algorithms, and practical Go implementations helps developers design resilient, high‑availability systems while balancing resource utilization and user experience.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.