Rate Limiting Strategies, Algorithms, and Implementations in Backend Systems

This article explains the concepts, strategies, and algorithms of rate limiting—including circuit breaking, service degradation, leaky‑bucket and token‑bucket methods—and provides practical Java, Guava, and Nginx + Lua implementations for controlling concurrency and protecting backend services.

Top Architect
Top Architect
Top Architect
Rate Limiting Strategies, Algorithms, and Implementations in Backend Systems

In daily life and online services, traffic spikes can overwhelm resources, so rate limiting is essential to maintain availability. The article discusses why limiting traffic is necessary and draws parallels between physical crowd control and network request throttling.

Rate Limiting Approaches

Circuit Breaking

When a system cannot recover quickly, a circuit breaker automatically rejects traffic to prevent overload. Tools like Hystrix and Alibaba Sentinel are mentioned.

Service Degradation

Non‑critical features (e.g., comments, points) can be temporarily disabled during traffic surges to free resources for core functions.

Delay Processing

Requests are buffered in a queue (e.g., a leaky‑bucket) and processed later, reducing immediate load at the cost of latency.

Privilege Processing

Requests are prioritized based on user classification, ensuring high‑priority users receive service first.

Rate Limiting Algorithms

Counter Algorithm

Simple fixed‑window counting, e.g., allowing at most 100 requests per minute for an API.

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(new CacheLoader<Long, AtomicLong>() {
        @Override
        public AtomicLong load(Long second) throws Exception {
            return new AtomicLong(0);
        }
    });
counter.get(1L).incrementAndGet();

Leaky Bucket Algorithm

Requests enter a bucket that leaks at a constant rate; excess requests overflow and are dropped, providing a smooth outflow.

Token Bucket Algorithm

Tokens are added to a bucket at a steady rate; a request proceeds only if a token is available, allowing bursts while enforcing an average rate.

public static void main(String[] args) {
    // RateLimiter.create(2) generates 2 tokens per second
    RateLimiter limiter = RateLimiter.create(2);
    System.out.println(limiter.acquire());
    try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); }
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
    System.out.println(limiter.acquire());
}

Concurrency Limiting

Typical configuration parameters in servers (e.g., Tomcat) include acceptCount, maxConnections, and maxThreads to cap total or instantaneous connections.

Limit total concurrency (e.g., database connection pool, thread pool)

Limit instantaneous connections (e.g., Nginx limit_conn)

Limit average rate within a time window (e.g., Guava RateLimiter, Nginx limit_req)

Limit remote API call rate, MQ consumption rate, etc.

Interface Limiting

Two aspects: total number of calls within a period (counter algorithm) and sliding window algorithms for finer granularity.

Implementation Examples

Guava Implementation

Include the Guava dependency and use RateLimiter for smooth token‑bucket rate limiting.

<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Distributed Limiting with Nginx + Lua

Use resty.lock for atomic operations and lua_shared_dict to store counters.

local locks = require "resty.lock"
local function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key")
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1)
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

These examples illustrate how to protect services from overload by applying appropriate rate‑limiting techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsalgorithmGuavarate limiting
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.