Backend Development 14 min read

Cache, Degradation, and Rate Limiting: Concepts, Algorithms, and Implementation in Java and Nginx

This article explains the role of caching, service degradation, and flow control in high‑concurrency systems, introduces common rate‑limiting algorithms such as counters, leaky bucket and token bucket, and provides practical Java and Nginx implementations with code examples.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Cache, Degradation, and Rate Limiting: Concepts, Algorithms, and Implementation in Java and Nginx

Cache

Cache is easy to understand; in large high‑concurrency systems, without cache the database can be overwhelmed and the system may crash instantly. Using cache not only speeds up access and increases concurrent throughput, but also protects the database and the system. Large websites are mostly read‑oriented, making cache an obvious choice.

In write‑heavy systems, cache also plays a crucial role, e.g., batching data writes, in‑memory cache queues (producer‑consumer), HBase write mechanisms, and even message middleware can be viewed as distributed data caches.

Degradation

Service degradation is a strategy to limit certain services or pages when server pressure spikes, releasing resources to keep core tasks running.

Degradation can have multiple levels, each handling different exception grades: rejecting service, delaying service, or random service.

Based on scope, it may cut off a specific feature or module. The goal is to keep the service partially functional rather than completely unavailable.

Rate Limiting

Rate limiting is a form of service degradation that restricts input and output flow to protect the system.

System throughput can be measured; when a threshold is reached, traffic is limited using measures such as delay, reject, or partial reject.

Rate‑Limiting Algorithms

Counter

The counter algorithm uses a sliding window (e.g., 1 second divided into 10 slots of 100 ms). It records request counts in each slot, typically using a LinkedList to store the last 10 counts and checks the difference between the newest and oldest entries.

// Service access count, can be stored in Redis for distributed counting
Long counter = 0L;
// Use LinkedList to record 10 slots of the sliding window
LinkedList
ll = new LinkedList<>();

public static void main(String[] args) {
    Counter counter = new Counter();
    counter.doCheck();
}

private void doCheck() {
    while (true) {
        ll.addLast(counter);
        if (ll.size() > 10) {
            ll.removeFirst();
        }
        // Compare last and first, if difference > 100 ms then limit
        if ((ll.peekLast() - ll.peekFirst()) > 100) {
            // To limit rate
        }
        Thread.sleep(100);
    }
}

Leaky Bucket

The leaky bucket algorithm (leaky bucket) is widely used for traffic shaping and policing. It models a bucket with fixed capacity that leaks at a constant rate; incoming packets fill the bucket, and excess packets are discarded when the bucket is full.

Implementation can use a queue in a single‑node system or message middleware/Redis in distributed environments.

Token Bucket

The token bucket stores a fixed number of tokens, adding tokens at a constant rate (e.g., 10 per second). When a request of n bytes arrives, n tokens are removed; if insufficient tokens exist, the request is delayed or dropped.

Token bucket allows dynamic rate control and can handle burst traffic, unlike the fixed‑rate leaky bucket.

Rate‑Limiting Implementations

Guava

Guava’s RateLimiter provides token‑bucket implementations: SmoothBursty and SmoothWarmingUp.

1. Regular Rate

public void test() {
    // Create a limiter that adds 2 tokens per second
    RateLimiter r = RateLimiter.create(2);
    while (true) {
        // acquire() returns the wait time for a token; blocks if none available
        System.out.println(r.acquire());
    }
}

The output shows roughly 0.5 seconds per token, achieving smooth output.

2. Burst Traffic

Acquiring multiple tokens demonstrates burst handling:

System.out.println(r.acquire(2));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));

After a 2‑second pause, the bucket accumulates tokens, allowing immediate acquisition.

Guava also supports warm‑up mode with a configurable warm‑up period.

Nginx

Nginx provides two modules for rate limiting:

Connection‑limit module ngx_http_limit_conn_module

Request‑limit module ngx_http_limit_req_module (leaky‑bucket implementation)

1. ngx_http_limit_conn_module

# Limit concurrent connections per user (key "one")
limit_conn_zone $binary_remote_addr zone=one:10m;
limit_conn_log_level error;
limit_conn_status 503;

In server{} block:

# Allow only 1 concurrent connection per IP
limit_conn one 1;

Testing with ab shows excess requests receive 503.

2. ngx_http_limit_req_module

# Define a zone limiting requests to 1 per second per IP
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
# Allow bursts of up to 5 requests
limit_req zone=one burst=5;

Using ab to send 10 requests demonstrates that the first request is processed immediately, the next 5 are queued, and the remaining 4 are dropped when the burst limit is exceeded.

JavacachingnginxRate Limitingtoken bucketdegradationleaky bucket
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.