An Introduction to Rate Limiting: Concepts, Classifications, and Go Implementation
This article explains the fundamentals of rate limiting, its importance for high‑availability services, various classification dimensions, common algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and demonstrates practical usage with Go's golang.org/x/time/rate library including code examples and configuration tips.
Rate Limiting Overview
Rate limiting (or flow control) restricts the number of events that can enter a system within a given time window, protecting services from overload, ensuring stable response times, and preventing cascading failures. It differs from circuit breaking, which is typically client‑side, while rate limiting is implemented on the server side.
Classification of Rate Limiting
Granularity
Single‑node (per‑service) rate limiting
Distributed rate limiting (e.g., NGINX + Redis, gateway clusters)
Single‑node limits traffic at an individual service instance, while distributed approaches coordinate limits across multiple nodes using a shared store, usually Redis, to achieve global consistency.
Target Types
Request‑based limiting (e.g., QPS, total request count)
Resource‑based limiting (e.g., TCP connections, threads, memory usage)
Request‑based limits control the number of incoming calls, whereas resource‑based limits protect critical system resources.
Algorithm Types
Fixed‑window counter
Sliding‑window counter
Leaky‑bucket
Token‑bucket
Fixed‑Window Counter
The simplest algorithm maintains a counter for a fixed time interval; when the interval expires the counter resets.
package limit
import (
"sync/atomic"
"time"
)
type Counter struct {
Count uint64 // current count
Limit uint64 // max requests per interval
Interval int64 // interval in ms
RefreshTime int64 // start of current window
}
func NewCounter(count, limit uint64, interval, rt int64) *Counter {
return &Counter{Count: count, Limit: limit, Interval: interval, RefreshTime: rt}
}
func (c *Counter) RateLimit() bool {
now := time.Now().UnixNano() / 1e6
if now < (c.RefreshTime + c.Interval) {
atomic.AddUint64(&c.Count, 1)
return c.Count <= c.Limit
} else {
c.RefreshTime = now
atomic.AddUint64(&c.Count, -c.Count)
return true
}
}This method is easy but suffers from burstiness and uneven distribution within the window.
Sliding‑Window Counter
The sliding window divides the interval into many small slots, each with its own counter, and slides the window forward, aggregating counts across slots to provide smoother limiting.
Leaky‑Bucket
Requests enter a fixed‑size queue (the bucket) and are released at a constant rate; excess requests are dropped, smoothing traffic but potentially increasing latency for bursts.
Token‑Bucket
Tokens are added to a bucket at a steady rate; each request consumes a token. If the bucket is empty, the request is rejected. This algorithm allows bursts up to the bucket capacity while enforcing an average rate.
Go Rate‑Limiting Library (golang.org/x/time/rate)
The Go standard library provides a token‑bucket implementation via rate.NewLimiter . It accepts a Limit (events per second) and a burst size (maximum tokens).
func NewLimiter(r Limit, b int) *LimiterExample:
limiter := rate.NewLimiter(10, 5) // 10 events/sec, burst up to 5The library offers three families of methods:
Allow / AllowN : non‑blocking checks; return false if the request would exceed the limit.
Wait / WaitN : block until enough tokens are available or the provided context expires.
Reserve / ReserveN : reserve tokens for future use, returning a Reservation that can be inspected, delayed, or cancelled.
Sample usage of Allow :
func AllowDemo() {
limiter := rate.NewLimiter(rate.Every(200*time.Millisecond), 5)
for i := 1; i <= 15; i++ {
if limiter.Allow() {
fmt.Println(i, "====Allow====", time.Now())
} else {
fmt.Println(i, "====Disallow====", time.Now())
}
time.Sleep(80 * time.Millisecond)
}
}Sample usage of WaitN with a timeout context:
func WaitNDemo() {
limiter := rate.NewLimiter(10, 5)
for i := 1; i <= 10; i++ {
ctx, cancel := context.WithTimeout(context.Background(), 400*time.Millisecond)
err := limiter.WaitN(ctx, 4)
if err != nil {
fmt.Println("error:", err)
cancel()
continue
}
fmt.Println(i, "executed at", time.Now())
cancel()
}
}Sample usage of ReserveN to obtain a delay before execution:
func ReserveNDemo() {
limiter := rate.NewLimiter(10, 5)
for i := 1; i <= 10; i++ {
r := limiter.ReserveN(time.Now(), 4)
if !r.OK() { return }
time.Sleep(r.Delay())
fmt.Println("executed:", time.Now())
}
}The limiter also supports dynamic adjustments via SetBurst , SetBurstAt , SetLimit , and SetLimitAt , allowing services to adapt limits based on real‑time metrics such as QPS, CPU usage, or latency.
Choosing the Right Strategy
Fixed‑window: simple, suitable for emergency stop‑gap measures.
Sliding‑window: handles moderate bursts with low implementation cost.
Leaky‑bucket: enforces smooth output, good for uniform traffic requirements.
Token‑bucket: best for systems expecting occasional spikes while maintaining high throughput.
Conclusion
Rate limiting is a crucial component of service governance. Understanding its classifications, algorithms, and practical Go implementations helps developers design resilient, high‑performance back‑end systems that can gracefully handle traffic surges and protect shared resources.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.