Backend Development 13 min read

Mastering Rate Limiting: Strategies, Algorithms, and Real-World Implementations

This article explains why rate limiting is essential for system stability, outlines common throttling patterns such as circuit breaking, degradation, delayed processing, and privilege handling, and dives into popular algorithms like counter, leaky bucket, and token bucket with concrete Java and Nginx examples.

Open Source Linux

Dec 2, 2021

Mastering Rate Limiting: Strategies, Algorithms, and Real-World Implementations

Rate Limiting Concepts

In everyday life, places like tourist attractions limit visitor numbers during holidays to avoid overcrowding and accidents; the same principle applies to web services where sudden traffic spikes can overwhelm a system, so throttling ensures availability.

Throttling Approaches

Circuit Breaking

Design systems to automatically open a circuit when errors persist, rejecting traffic to protect downstream services. When the issue resolves, the circuit closes and normal service resumes. Common tools include Hystrix and Alibaba Sentinel.

Service Degradation

When traffic surges, non‑essential features (e.g., product reviews, points) can be temporarily disabled to free resources for core functions like order processing. After recovery, degraded services are re‑enabled and any missed actions are compensated.

Delayed Processing

Requests are placed into a buffer (e.g., a queue) and processed asynchronously, reducing immediate load on back‑end services. This is the basis for leaky‑bucket and token‑bucket algorithms.

Privilege Handling

Users are classified so that high‑priority groups receive service first, while others may be delayed or rejected.

Difference Between Cache, Degradation, and Throttling

Cache increases throughput and response speed.

Degradation temporarily disables failing components while providing fallback data.

Throttling limits request rates when caching and degradation are insufficient, protecting services before they become unavailable.

Throttling Algorithms

Counter Algorithm

Simple implementation: maintain a counter for a time window (e.g., 100 requests per minute). Increment on each request; reset when the window expires.

Leaky Bucket Algorithm

Requests flow into a bucket that leaks at a constant rate; excess requests overflow and are dropped, smoothing traffic bursts.

Token Bucket Algorithm

Tokens are added to a bucket at a fixed rate; a request proceeds only if a token is available, allowing controlled bursts while enforcing an average rate.

Concurrent Throttling

Set overall QPS limits, e.g., Tomcat's acceptCount, maxConnections, and maxThreads. Other techniques include limiting total concurrency (thread pools, DB connections), instantaneous connections (nginx limit_conn), and average rate within a time window (Guava RateLimiter, nginx limit_req).

API Throttling

Two aspects: total request count within a period (counter algorithm) and sliding window algorithms for finer‑grained control.

Sliding Window

Divides a fixed interval into smaller sub‑windows, providing smoother and more accurate rate limiting compared to a simple fixed window.

Implementation Examples

Guava RateLimiter

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Core code:

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(new CacheLoader<Long, AtomicLong>() {
        @Override
        public AtomicLong load(Long second) throws Exception {
            return new AtomicLong(0);
        }
    });
counter.get(1L).incrementAndGet();

Token Bucket with Guava

public static void main(String[] args) {
    // RateLimiter.create(2) generates 2 tokens per second
    RateLimiter limiter = RateLimiter.create(2);
    System.out.println(limiter.acquire());
    Thread.sleep(2000);
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
}

Use RateLimiter.create(2, 1000, TimeUnit.MILLISECONDS) for a warm‑up mode.

Distributed Throttling with Nginx + Lua

Lua script uses resty.lock for atomicity and ngx.shared.limit_counter to store counters.

local locks = require "resty.lock"
local function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key")
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1)
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

https://github.com/openresty/lua-resty-lock

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems algorithm concurrency rate limiting

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.