Backend Development 11 min read

Mastering Rate Limiting: Practical Algorithms and Real‑World Cases

This article explains why rate limiting is essential for high‑traffic services, compares four classic algorithms (fixed‑window, sliding‑window, leaky‑bucket, token‑bucket), provides Java and Redis implementations, shares production case studies, highlights common pitfalls, and offers performance‑tuning tips for robust backend systems.

Su San Talks Tech

Apr 8, 2025

Mastering Rate Limiting: Practical Algorithms and Real‑World Cases

Preface

Last summer, a financial platform reported a payment‑API error rate soaring to 35%. When I arrived at the data‑center, the database connection pool was exhausted and requests piled up – a classic case of missing rate‑limiting protection.

Rate limiting is not about denying service; it is about sacrificing controllable traffic to protect core pathways.

During a major e‑commerce promotion, a token‑bucket limited the flash‑sale API to 50 k QPS, losing 20 % of burst traffic but preserving 99 % of core transaction success.

1 Common Rate‑Limiting Solutions

1.1 Fixed Window Counter

Core principle: Use a fixed time window (e.g., 1 second) to count requests; reject any request that exceeds the threshold within the window.

Specific code implementation:

public class FixedWindowCounter {
    private final AtomicLong counter = new AtomicLong(0);
    private volatile long windowStart = System.currentTimeMillis();
    private final int maxRequests;
    private final long windowMillis;

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();
        if (now - windowStart > windowMillis) {
            if (counter.compareAndSet(counter.get(), 0)) {
                windowStart = now;
            }
        }
        return counter.incrementAndGet() <= maxRequests;
    }
}

Fatal flaw: If a limit of 100 requests per second is set, a burst of 100 requests at 0.9 s and another 100 at 0.1 s of the next second will allow 200 requests in two seconds.

Applicable scenarios: Log collection, non‑critical interfaces with coarse‑grained limiting.

1.2 Sliding Window

Core principle: Divide the time window into smaller slices (e.g., 10 seconds) and sum requests of the most recent N slices.

Redis Lua script implementation:

// Redis Lua implementation of sliding window (millisecond precision)
String lua = "";
    local now = tonumber(ARGV
    local window = tonumber(ARGV
    local key = KEYS[1]

    redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
    local count = redis.call('ZCARD', key)

    if count < tonumber(ARGV then
        redis.call('ZADD', key, now, now)
        redis.call('EXPIRE', key, window/1000)
        return 1
    end
    return 0
"";

Technical highlight: A securities trading system reduced API exception rate from 5 % to 0.3 % after adopting sliding windows, achieving ±10 ms accuracy with Redis ZSET.

Advantages comparison:

Metric | Fixed Window | Sliding Window --- | --- | --- Time precision | 1 second | 100 ms Critical burst issue | Exists | Eliminated Implementation complexity | Simple | Medium

2.3 Leaky Bucket Algorithm

Core principle: Requests flow into a bucket like water; the system processes them at a fixed rate. When the bucket is full, new requests are dropped.

Specific implementation:

// Leaky bucket dynamic implementation (Semaphore version)
public class LeakyBucket {
    private final Semaphore permits;
    private final ScheduledExecutorService scheduler;

    public LeakyBucket(int rate) {
        this.permits = new Semaphore(rate);
        this.scheduler = Executors.newScheduledThreadPool(1);
        scheduler.scheduleAtFixedRate(() -> permits.release(rate), 1, 1, TimeUnit.SECONDS);
    }

    public boolean tryAcquire() {
        return permits.tryAcquire();
    }
}

Technical pain point: An IoT platform used this scheme to handle 100 k devices reporting data, processing at a stable 500 req/s, but burst traffic caused queue buildup similar to a clogged milk‑tea funnel.

Applicable scenarios: IoT command dispatch, payment channel quotas, any situation requiring a steady processing rate.

1.4 Token Bucket Algorithm

Core principle: Tokens are generated at a fixed rate; a request must acquire a token to proceed.

Bursts can consume accumulated tokens.

Specific implementation (Guava RateLimiter):

// Guava RateLimiter advanced usage
RateLimiter limiter = RateLimiter.create(10.0, 1, TimeUnit.SECONDS); // warm‑up
limiter.acquire(5); // try to get 5 tokens
// Dynamically adjust rate via reflection (example)
Field field = RateLimiter.class.getDeclaredField("tokens");
field.setAccessible(true);
AtomicDouble tokens = (AtomicDouble) field.get(limiter);
tokens.set(20); // inject 20 tokens during burst

Real‑world case: A video platform limited normal traffic to 100 k QPS, but allowed a 50 % over‑limit for 3 seconds during hot events, preventing avalanche while preserving user experience.

Dynamic characteristics:

Normal QPS limit

Burst allows token over‑draw

Sustained burst depletes tokens

2 Production Environment Practices

2.1 Distributed Rate Limiting at the Gateway Layer

An e‑commerce Double‑11 solution used Redis + Lua for distributed counting combined with Nginx local cache, blocking 83 % of malicious requests at the gateway.

2.2 Adaptive Circuit‑Breaker Mechanism

A social platform automatically lowered the rate‑limit from 50 k to 30 k during traffic spikes and gradually restored it after recovery.

3 Pitfalls and Performance Optimization

3.1 Fatal Mistakes

Placing rate limiting before the database connection pool can cause connection leaks and crash the database.

Correct practice follows the three principles of circuit breaking:

Fast failure: intercept invalid requests at the entry layer.

Dynamic degradation: keep minimal resources for core services.

Automatic recovery: gradual ramp‑up after circuit opens.

3.2 Performance Optimization

A financial system’s JMH test showed that replacing AtomicLong with LongAdder increased rate‑limiting throughput by 220 %.

Optimization techniques: reduce CAS contention and use segmented lock bases.

Conclusion

The article listed the four most commonly used rate‑limiting schemes in practice.

Different business scenarios require selecting the appropriate algorithm.

The golden rule of rate limiting is akin to a high‑speed rail gate: it ensures traffic efficiency while safeguarding safety.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Distributed Systems redis rate limiting

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.