Backend Development 14 min read

Why Rate Limiting Matters: Strategies, Algorithms, and Real-World Implementations

Rate limiting protects services from overload by controlling traffic, using techniques such as circuit breaking, degradation, buffering, privilege handling, and algorithms like counters, leaky bucket, and token bucket, with implementations ranging from Guava in Java to Nginx+Lua for distributed systems.

ITFLY8 Architecture Home

Oct 20, 2021

Why Rate Limiting Matters: Strategies, Algorithms, and Real-World Implementations

Why Rate Limiting Is Needed

In everyday life and online services, uncontrolled traffic can cause congestion, accidents, or system crashes. Limiting traffic ensures that a system remains usable for a defined number of users while excess requests wait in a queue.

Rate‑Limiting Approaches

Circuit Breaking

When a service cannot recover quickly, a circuit breaker automatically rejects traffic to prevent overload. Once the service stabilizes, the breaker is closed and normal traffic resumes. Common tools include Hystrix and Alibaba Sentinel.

Service Degradation

Non‑critical features are temporarily disabled during spikes, freeing resources for core functions. For example, an e‑commerce site may suspend comments or loyalty points during a flash sale.

Buffering (Delay Handling)

Requests are placed in a buffer (e.g., a queue) and processed sequentially, reducing immediate load on the backend.

Privilege Handling

Users are classified, and high‑priority users receive service before others during congestion.

Difference Between Cache, Degradation, and Rate Limiting

Cache increases throughput and speeds up access.

Degradation temporarily disables failing components while providing fallback data.

Rate Limiting restricts request frequency when cache and degradation are insufficient, protecting the service before it becomes unavailable.

Rate‑Limiting Algorithms

Counter Algorithm

Simple counting of active resources (threads, DB connections, etc.) or requests within a time window. Example: allow at most 100 requests per minute.

Leaky Bucket Algorithm

Requests enter a bucket that leaks at a constant rate; excess requests overflow and are dropped, smoothing traffic bursts.

Token Bucket Algorithm

Tokens are added to a bucket at a steady rate; a request proceeds only if a token is available. This allows occasional bursts while still limiting overall rate.

Concurrency Limiting

Set a global QPS threshold; for example, Tomcat’s acceptCount, maxConnections, and maxThreads control connection and thread limits.

Limit total concurrency (e.g., DB connection pool, thread pool)

Limit instantaneous concurrency (e.g., Nginx limit_conn)

Limit average rate within a time window (e.g., Guava RateLimiter, Nginx limit_req)

Limit remote API call rate or MQ consumption rate

Limit based on network, CPU, or memory load

Interface Limiting

Two parts: a fixed‑window counter for total calls and a sliding‑window algorithm for finer‑grained control.

Fixed Window Issues

A fixed 1‑minute window can miss spikes that cross window boundaries, leading to inaccurate throttling.

Sliding Window

Divides the interval into smaller slots (e.g., milliseconds) for smoother, more precise rate limiting, at the cost of higher memory usage.

Implementation Examples

Guava (Java) Counter

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(new CacheLoader<Long, AtomicLong>() {
        @Override
        public AtomicLong load(Long second) {
            return new AtomicLong(0);
        }
    });
counter.get(1L).incrementAndGet();

Guava Token Bucket (SmoothBursty)

RateLimiter limiter = RateLimiter.create(2); // 2 tokens per second
System.out.println(limiter.acquire());
Thread.sleep(2000);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

Guava Token Bucket (SmoothWarmingUp)

RateLimiter limiter = RateLimiter.create(2, 1000L, TimeUnit.MILLISECONDS);
System.out.println(limiter.acquire());
Thread.sleep(2000);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

Distributed Limiting with Nginx + Lua

local locks = require "resty.lock"
local function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key")
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1)
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

These examples illustrate how rate limiting can be applied at various layers—from in‑process Java code to distributed Nginx/Lua scripts—to keep services responsive under heavy load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm traffic control rate limiting

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.