Cache, Degradation, and Rate Limiting: Concepts, Algorithms, and Practical Implementations

This article explains the three essential tools for building high‑concurrency systems—caching, service degradation, and rate limiting—detailing their purposes, common algorithms such as counter, leaky bucket and token bucket, and providing concrete Java and Nginx configuration examples.

Architecture Digest
Architecture Digest
Architecture Digest
Cache, Degradation, and Rate Limiting: Concepts, Algorithms, and Practical Implementations

Author nick hao introduces three essential mechanisms for protecting high‑concurrency systems: cache, degradation, and rate limiting, and then dives into each topic with concepts, algorithms, and practical implementations.

Cache

Cache is crucial in large‑scale systems to avoid overwhelming the database; it improves access speed, increases concurrent capacity, and protects the backend. Both read‑heavy and write‑heavy architectures benefit from caching, for example by batching writes or using in‑memory queues and distributed caches like HBase or message brokers.

Degradation

Service degradation reduces load during traffic spikes by selectively disabling features or modules, either refusing or delaying requests based on business priorities, ensuring core functionality remains available even when performance is compromised.

Rate Limiting

Rate limiting, a form of degradation, restricts the input and output flow to protect the system. When the measured throughput reaches a predefined threshold, the system can delay, reject, or partially reject requests.

Rate‑Limiting Algorithms

Common algorithms include Counter, Leaky Bucket, and Token Bucket.

Counter

The simplest method uses a sliding window of fixed slots (e.g., ten 100‑ms slots) to count requests. A LinkedList<Long> stores timestamps; when the difference between the newest and oldest exceeds the allowed limit, the request is throttled.

// Service request counter, can be stored in Redis for distributed counting
Long counter = 0L;
LinkedList<Long> ll = new LinkedList<>();
public static void main(String[] args) {
    Counter counter = new Counter();
    counter.doCheck();
}
private void doCheck() {
    while (true) {
        ll.addLast(counter);
        if (ll.size() > 10) {
            ll.removeFirst();
        }
        // Compare last and first timestamps; if difference > 100ms, limit rate
        if ((ll.peekLast() - ll.peekFirst()) > 100) {
            // To limit rate
        }
        Thread.sleep(100);
    }
}

Leaky Bucket

The leaky bucket algorithm enforces a fixed outflow rate; excess inflow is discarded when the bucket is full. It is easy to implement with a queue in a single‑node system or with message queues/Redis in distributed environments.

Token Bucket

A token bucket holds a fixed number of tokens that are added at a constant rate. When a request arrives, it consumes tokens; if insufficient tokens exist, the request is delayed or dropped. This algorithm supports burst traffic and can be tuned for warm‑up periods.

Implementation with Guava

Google Guava provides RateLimiter, which implements the token‑bucket algorithm. Two variants are available: SmoothBursty for handling bursts and SmoothWarmingUp for gradual ramp‑up.

Example of a regular rate limiter (2 permits per second):

public void test() {
    /**
     * Create a RateLimiter with 2 permits per second.
     * acquire() returns the wait time for a permit.
     */
    RateLimiter r = RateLimiter.create(2);
    while (true) {
        System.out.println(r.acquire());
    }
}

For burst handling, acquire multiple permits at once:

System.out.println(r.acquire(2));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));

Using the warm‑up variant adds a configurable delay period (e.g., 3 seconds) before reaching the steady rate.

Rate Limiting with Nginx

Nginx offers two built‑in modules: ngx_http_limit_conn_module for limiting concurrent connections and ngx_http_limit_req_module for limiting request rates using the leaky‑bucket algorithm.

Example configuration to limit each IP to one concurrent connection:

# Define a zone for connection limiting
limit_conn_zone $binary_remote_addr zone=one:10m;
limit_conn_log_level error;
limit_conn_status 503;
# Apply the limit in the server block
limit_conn one 1;

Example configuration to limit request rate to 1 request per second with a burst of 5:

# Define a zone for request limiting (1 r/s)
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
# Apply the limit in the server block
limit_req zone=one burst=5;

Testing with ab demonstrates that excess connections or requests receive a 503 response, confirming the effectiveness of the limits.

For further reading, the article lists several Chinese resources on high‑concurrency architecture and design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

service degradationGuavarate limitingToken Bucketleaky bucket
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.