Backend Development 15 min read

Mastering Rate Limiting: Algorithms, Strategies, and Practical Implementations

This article explains why rate limiting is essential for both physical venues and online services, outlines common throttling strategies such as circuit breaking, service degradation, delay processing, and privilege handling, compares caching, degradation, and limiting, and details popular algorithms and concrete code examples for implementing rate limiting in Java and Nginx/Lua environments.

Architect's Guide

Feb 27, 2026

Mastering Rate Limiting: Algorithms, Strategies, and Practical Implementations

Why Rate Limiting

When traffic exceeds a system’s capacity, requests must be throttled to keep the service available and avoid crashes. The principle is similar to limiting visitors at a tourist site during holidays to prevent overcrowding and accidents.

Rate‑Limiting Strategies

Circuit Breaking

If a downstream service fails and cannot recover quickly, a circuit‑breaker automatically rejects incoming traffic. Once the service stabilises, the breaker closes and normal traffic resumes. Common libraries include Hystrix and Alibaba Sentinel.

Service Degradation

Non‑critical features (e.g., product reviews, points) are temporarily disabled during overload, freeing resources for core functions. Degraded services can be restored later with compensation logic.

Delay Processing

Requests are placed into a buffer (often a message queue) and processed asynchronously, smoothing burst traffic. This is the basis of leaky‑bucket and token‑bucket algorithms.

Privilege Processing

Users are classified into priority groups; high‑priority users receive service while lower‑priority users are delayed or rejected.

Cache vs. Degradation vs. Limiting

Cache increases throughput and reduces latency. Degradation disables failing components to keep the system functional. Limiting restricts request rates when caching and degradation are insufficient, ensuring only a portion of users are served.

Rate‑Limiting Algorithms

Counter Algorithm

Count requests within a fixed window (e.g., no more than 100 calls per minute). When the counter exceeds the limit, further requests are rejected. Simple to implement but coarse‑grained.

Leaky‑Bucket Algorithm

Incoming requests enter a bucket that leaks at a constant rate. If the bucket is full, excess requests are dropped, effectively smoothing bursts.

Token‑Bucket Algorithm

Tokens are added to a bucket at a fixed rate; a request proceeds only if a token is available. Tokens can accumulate, allowing short bursts while maintaining an average rate.

Concurrent Limiting

System‑wide thresholds such as maximum QPS, connection limits, or thread‑pool sizes protect against traffic spikes. In Tomcat, typical parameters are: acceptCount – maximum pending connections maxConnections – maximum simultaneous connections maxThreads – maximum worker threads

Common approaches:

Limit total concurrency (e.g., database connection pool, thread pool)

Limit instantaneous connections (e.g., Nginx limit_conn)

Limit average rate within a time window (e.g., Guava RateLimiter, Nginx limit_req)

Limit remote‑API call rates or message‑queue consumption rates

Adjust limits based on CPU, memory, or network load

Interface Limiting

Two aspects are usually combined:

Total call count within a fixed window (counter algorithm)

Sliding‑window counting for finer granularity. Sliding windows divide the interval into smaller slots, providing more accurate throttling at the cost of higher memory usage.

Practical Implementations

Guava RateLimiter (Java)

Dependency (Maven):

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Core usage – smooth‑bursty mode (constant token generation):

RateLimiter limiter = RateLimiter.create(2); // 2 tokens per second
limiter.acquire(); // blocks until a token is available
// subsequent calls acquire additional tokens

Smooth‑warming‑up mode – token generation ramps up gradually:

RateLimiter limiter = RateLimiter.create(2, 1000, TimeUnit.MILLISECONDS);
limiter.acquire();

Try‑acquire with timeout:

boolean ok = limiter.tryAcquire(Duration.ofMillis(11)); // returns false if no token within 11 ms

Distributed Limiting with Nginx + Lua

Use resty.lock for atomic operations and lua_shared_dict for shared counters. The lock library is available at https://github.com/openresty/lua-resty-lock.

local locks = require "resty.lock"
function acquire()
  local lock = locks:new("locks")
  local elapsed, err = lock:lock("limit_key") -- atomic lock
  local limit_counter = ngx.shared.limit_counter
  local key = "ip:" .. os.time()
  local limit = 5
  local current = limit_counter:get(key)
  if current and current + 1 > limit then
    lock:unlock()
    return 0
  end
  if not current then
    limit_counter:set(key, 1, 1) -- first request, expire after 1 s
  else
    limit_counter:incr(key, 1)
  end
  lock:unlock()
  return 1
end
ngx.print(acquire())

Token‑Bucket Example (Guava)

public static void main(String[] args) throws InterruptedException {
    // 2 tokens per second
    RateLimiter limiter = RateLimiter.create(2);
    System.out.println(limiter.acquire()); // immediate token
    Thread.sleep(2000);
    // after pause, burst of tokens is available
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
}

In this example, RateLimiter.create(2) allows up to two requests per second on average, while unused tokens accumulate to permit short bursts.

guava Rate Limiting Lua throttling algorithms

Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.