Backend Development 25 min read

An Introduction to Rate Limiting: Concepts, Classifications, Algorithms, and Go Implementation

This article introduces rate limiting, explains its purpose and classifications, compares fixed and sliding windows, describes common algorithms such as token bucket, leaky bucket, and counter, and provides detailed Go code examples using the golang.org/x/time/rate library for practical implementation.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
An Introduction to Rate Limiting: Concepts, Classifications, Algorithms, and Go Implementation

Rate Limiting Overview

Rate limiting (Rate Limit) is a protection mechanism for high‑availability systems that allows only a specified number of events to enter a service. Requests exceeding the limit are rejected, queued, or degraded, ensuring that a portion of traffic can be served normally and preventing system avalanche.

Why Rate Limiting Is Needed

It is not only a response to an overloaded service; it is a self‑protective measure against resource scarcity and security risks. By limiting traffic, a system can provide maximal service capacity with limited resources, serving expected traffic while rejecting or delaying excess requests.

Standard Support

HTTP RFC 6585 defines status code 429 Too Many Requests with a Retry-After header to indicate when the client may retry.

Rate Limiting Classification

Rate limiting can be classified by granularity, object type, and algorithm.

Granularity

Single‑machine (single service node) limiting

Distributed limiting (multiple nodes, often using a central store such as Redis)

Distributed limiting requires handling data consistency, time synchronization, timeout handling, and performance/reliability of the central store.

Object Type

Request‑based limiting (e.g., QPS, total request count)

Resource‑based limiting (e.g., TCP connections, thread count, memory usage)

Algorithm Classification

Counter (fixed‑window)

Sliding‑window counter

Leaky bucket

Token bucket

Fixed‑Window Counter

The simplest algorithm maintains a counter for a fixed time window. When the window expires, the counter resets.

package limit

import (
    "sync/atomic"
    "time"
)

type Counter struct {
    Count       uint64 // current count
    Limit       uint64 // max requests per window
    Interval    int64  // window size in ms
    RefreshTime int64  // window start timestamp
}

func NewCounter(count, limit uint64, interval, rt int64) *Counter {
    return &Counter{Count: count, Limit: limit, Interval: interval, RefreshTime: rt}
}

func (c *Counter) RateLimit() bool {
    now := time.Now().UnixNano() / 1e6
    if now < c.RefreshTime+c.Interval {
        atomic.AddUint64(&c.Count, 1)
        return c.Count <= c.Limit
    } else {
        c.RefreshTime = now
        atomic.AddUint64(&c.Count, -c.Count)
        return true
    }
}

Drawbacks: coarse granularity leads to burst‑sensitivity and can double the effective QPS in edge cases.

Sliding‑Window Counter

The window is divided into many small sub‑intervals, each with its own counter. The window slides forward, discarding the oldest sub‑interval and adding a new one, providing smoother rate control at the cost of higher memory usage.

Leaky Bucket

Requests enter a fixed‑size queue (the bucket) and are released at a constant rate. Excess requests overflow and are dropped, smoothing traffic but introducing latency for bursts.

Token Bucket

Tokens are added to a bucket at a constant rate; a request consumes a token. If the bucket is empty, the request is rejected. This algorithm allows bursts while enforcing an average rate.

Go Rate‑Limiting Library (golang.org/x/time/rate)

The Go standard library provides a token‑bucket implementation via rate.NewLimiter . Example:

import "golang.org/x/time/rate"

limiter := rate.NewLimiter(10, 5) // 10 events/sec, burst up to 5

Key methods:

Allow/AllowN – non‑blocking check, returns false if the request would exceed the limit.

Wait/WaitN – blocks until enough tokens are available or the context expires.

Reserve/ReserveN – returns a Reservation that can be inspected, delayed, or cancelled.

Example of Allow usage:

func AllowDemo() {
    limiter := rate.NewLimiter(rate.Every(200*time.Millisecond), 5)
    for i := 0; i < 15; i++ {
        if limiter.Allow() {
            fmt.Println(i, "allowed", time.Now())
        } else {
            fmt.Println(i, "blocked", time.Now())
        }
        time.Sleep(80 * time.Millisecond)
    }
}

Example of Wait usage with timeout:

func WaitNDemo() {
    limiter := rate.NewLimiter(10, 5)
    for i := 0; i < 10; i++ {
        ctx, cancel := context.WithTimeout(context.Background(), 400*time.Millisecond)
        defer cancel()
        if err := limiter.WaitN(ctx, 4); err != nil {
            fmt.Println("wait error:", err)
            continue
        }
        fmt.Println(i, "executed", time.Now())
    }
}

Example of Reserve usage to obtain the exact delay before execution:

func ReserveNDemo() {
    limiter := rate.NewLimiter(10, 5)
    for i := 0; i < 10; i++ {
        r := limiter.ReserveN(time.Now(), 4)
        if !r.OK() { return }
        delay := r.Delay()
        time.Sleep(delay)
        fmt.Println("executed", time.Now(), delay)
    }
}

The limiter also supports dynamic adjustment of burst size and rate via SetBurst and SetLimit methods.

Conclusion

Rate limiting is a fundamental component of service governance. Understanding its classifications, algorithms, and practical Go implementations helps developers design resilient, high‑availability systems while balancing resource utilization and user experience.

backenddistributed systemsgolangRate Limitingtoken bucketleaky bucket
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.