Cache, Degradation, and Rate Limiting: Concepts, Algorithms, and Implementation in Java and Nginx
This article explains the role of caching, service degradation, and flow control in high‑concurrency systems, introduces common rate‑limiting algorithms such as counters, leaky bucket and token bucket, and provides practical Java and Nginx implementations with code examples.
Cache
Cache is easy to understand; in large high‑concurrency systems, without cache the database can be overwhelmed and the system may crash instantly. Using cache not only speeds up access and increases concurrent throughput, but also protects the database and the system. Large websites are mostly read‑oriented, making cache an obvious choice.
In write‑heavy systems, cache also plays a crucial role, e.g., batching data writes, in‑memory cache queues (producer‑consumer), HBase write mechanisms, and even message middleware can be viewed as distributed data caches.
Degradation
Service degradation is a strategy to limit certain services or pages when server pressure spikes, releasing resources to keep core tasks running.
Degradation can have multiple levels, each handling different exception grades: rejecting service, delaying service, or random service.
Based on scope, it may cut off a specific feature or module. The goal is to keep the service partially functional rather than completely unavailable.
Rate Limiting
Rate limiting is a form of service degradation that restricts input and output flow to protect the system.
System throughput can be measured; when a threshold is reached, traffic is limited using measures such as delay, reject, or partial reject.
Rate‑Limiting Algorithms
Counter
The counter algorithm uses a sliding window (e.g., 1 second divided into 10 slots of 100 ms). It records request counts in each slot, typically using a LinkedList to store the last 10 counts and checks the difference between the newest and oldest entries.
// Service access count, can be stored in Redis for distributed counting
Long counter = 0L;
// Use LinkedList to record 10 slots of the sliding window
LinkedList
ll = new LinkedList<>();
public static void main(String[] args) {
Counter counter = new Counter();
counter.doCheck();
}
private void doCheck() {
while (true) {
ll.addLast(counter);
if (ll.size() > 10) {
ll.removeFirst();
}
// Compare last and first, if difference > 100 ms then limit
if ((ll.peekLast() - ll.peekFirst()) > 100) {
// To limit rate
}
Thread.sleep(100);
}
}Leaky Bucket
The leaky bucket algorithm (leaky bucket) is widely used for traffic shaping and policing. It models a bucket with fixed capacity that leaks at a constant rate; incoming packets fill the bucket, and excess packets are discarded when the bucket is full.
Implementation can use a queue in a single‑node system or message middleware/Redis in distributed environments.
Token Bucket
The token bucket stores a fixed number of tokens, adding tokens at a constant rate (e.g., 10 per second). When a request of n bytes arrives, n tokens are removed; if insufficient tokens exist, the request is delayed or dropped.
Token bucket allows dynamic rate control and can handle burst traffic, unlike the fixed‑rate leaky bucket.
Rate‑Limiting Implementations
Guava
Guava’s RateLimiter provides token‑bucket implementations: SmoothBursty and SmoothWarmingUp.
1. Regular Rate
public void test() {
// Create a limiter that adds 2 tokens per second
RateLimiter r = RateLimiter.create(2);
while (true) {
// acquire() returns the wait time for a token; blocks if none available
System.out.println(r.acquire());
}
}The output shows roughly 0.5 seconds per token, achieving smooth output.
2. Burst Traffic
Acquiring multiple tokens demonstrates burst handling:
System.out.println(r.acquire(2));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));After a 2‑second pause, the bucket accumulates tokens, allowing immediate acquisition.
Guava also supports warm‑up mode with a configurable warm‑up period.
Nginx
Nginx provides two modules for rate limiting:
Connection‑limit module ngx_http_limit_conn_module
Request‑limit module ngx_http_limit_req_module (leaky‑bucket implementation)
1. ngx_http_limit_conn_module
# Limit concurrent connections per user (key "one")
limit_conn_zone $binary_remote_addr zone=one:10m;
limit_conn_log_level error;
limit_conn_status 503;In server{} block:
# Allow only 1 concurrent connection per IP
limit_conn one 1;Testing with ab shows excess requests receive 503.
2. ngx_http_limit_req_module
# Define a zone limiting requests to 1 per second per IP
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
# Allow bursts of up to 5 requests
limit_req zone=one burst=5;Using ab to send 10 requests demonstrates that the first request is processed immediately, the next 5 are queued, and the remaining 4 are dropped when the burst limit is exceeded.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.