Backend Development 15 min read

Understanding Rate Limiting: Concepts, Algorithms, and Practical Implementations

This article explains why rate limiting is essential for both physical venues and online services, describes common strategies such as circuit breaking, service degradation, delay handling, and privilege handling, compares caching, degradation, and limiting, and details counter, leaky‑bucket, and token‑bucket algorithms with concrete Guava and Nginx‑Lua implementations.

Architect's Guide

Aug 1, 2022

Understanding Rate Limiting: Concepts, Algorithms, and Practical Implementations

Why Rate Limiting

In everyday life, places like tourist attractions limit the number of visitors to avoid overcrowding, accidents, and poor experience; the same principle applies to online systems where sudden traffic spikes can overwhelm servers, so limiting traffic preserves availability.

Rate Limiting Approaches

Circuit Breaker

When a system detects unrecoverable errors, it automatically opens a circuit to reject traffic, preventing overload; once the backend stabilises, the circuit closes. Common tools include Hystrix and Alibaba Sentinel.

Service Degradation

Non‑critical features are temporarily disabled during high load, freeing resources for core functions. For example, an e‑commerce site may disable comments or points during a traffic surge.

Delay Handling

Requests are buffered in a queue (a leaky‑bucket style) and processed sequentially, smoothing spikes but potentially adding latency when the buffer overflows.

Privilege Handling

Users are classified, allowing high‑priority groups to receive service while others are delayed or rejected.

Difference Between Cache, Degradation, and Rate Limiting

Cache increases throughput and speeds up access; Degradation temporarily shields failing components and returns fallback data; Rate Limiting restricts request frequency when caching and degradation are insufficient, protecting the service before it becomes unavailable.

Rate Limiting Algorithms

Counter Algorithm

A simple method that counts requests within a fixed window (e.g., 100 requests per minute) and rejects excess traffic. It can be implemented by setting limits on thread pools, database connections, or Nginx connections.

Leaky Bucket Algorithm

Requests enter a bucket and are released at a constant rate; if the bucket overflows, excess requests are dropped, effectively smoothing bursts and protecting downstream services.

Token Bucket Algorithm

Tokens are added to a bucket at a steady rate; a request proceeds only if a token is available, allowing controlled bursts while still enforcing an average rate.

Concurrent Rate Limiting

Limits can be applied to total concurrency (e.g., database connection pools), instantaneous connections (e.g., Nginx limit_conn), average rate within a time window (e.g., Guava RateLimiter or Nginx limit_req), remote API calls, or MQ consumption.

Limit total concurrency (database pool, thread pool)

Limit instantaneous connections (Nginx limit_conn)

Limit average QPS (Guava RateLimiter, Nginx limit_req)

Limit remote API or MQ consumption rates

Adjust limits based on CPU, memory, or network usage

Interface Rate Limiting

Total Calls

Restricts the number of times an API can be invoked within a given period, typically using the counter algorithm.

Sliding Window

Divides the time window into smaller slots to achieve finer‑grained counting, reducing the inaccuracy of fixed windows and handling bursty traffic more precisely.

Implementation

Guava Implementation

Dependency:

<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Core code:

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(new CacheLoader<Long, AtomicLong>() {
        @Override
        public AtomicLong load(Long second) throws Exception {
            return new AtomicLong(0);
        }
    });
counter.get(1L).incrementAndGet();

Token Bucket Implementation

SmoothBursty (constant token generation):

public static void main(String[] args) {
    // RateLimiter.create(2) generates 2 tokens per second
    RateLimiter limiter = RateLimiter.create(2);
    System.out.println(limiter.acquire());
    Thread.sleep(2000);
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
}

SmoothWarmingUp (gradual increase to stable rate):

RateLimiter limiter = RateLimiter.create(2, 1000L, TimeUnit.MILLISECONDS);
System.out.println(limiter.acquire());
Thread.sleep(2000);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

Timeout check:

boolean tryAcquire = limiter.tryAcquire(Duration.ofMillis(11));

Distributed Rate Limiting with Nginx + Lua

Uses resty.lock for atomicity and lua_shared_dict for counters. Example Lua code:

local locks = require "resty.lock"
local function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key") -- mutex lock
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1) -- first hit, set 1‑second TTL
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

These snippets illustrate how to apply rate limiting in both single‑node Java services and distributed Nginx‑Lua environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend algorithm Guava

Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.