Backend Development 17 min read

Rate Limiting: Concepts, Common Algorithms, and Practical Implementation Strategies

This article explains the fundamentals of rate limiting, describes widely used algorithms such as token bucket, leaky bucket, and sliding window, and details practical implementation methods ranging from single‑machine tools like Guava and Tomcat to distributed solutions using Nginx, Redis, and Sentinel.

Architect
Architect
Architect
Rate Limiting: Concepts, Common Algorithms, and Practical Implementation Strategies

Basic Concepts

Rate limiting typically involves two dimensions: time (a time window such as per second or per minute) and resources (maximum number of accesses or connections). Combining these, limits are set per time window on resources, often with multiple rules operating together.

QPS and Connection Control

Limits can be applied per IP, per server, or across a server group, e.g., IP < 10 QPS, connections < 5, each server QPS ≤ 1000, total connections ≤ 200.

Transmission Rate

Different user tiers may have different download speeds, e.g., 100 KB/s for regular users and 10 MB/s for premium members, implemented via user‑group based rate limiting.

Blacklist/Whitelist

Dynamic blacklists block IPs that exceed request thresholds, while whitelists grant unrestricted access to trusted accounts.

Distributed Environment

In distributed systems, rate‑limit data should be stored centrally, typically using a gateway‑level limit, middleware, or a component like Redis.

Common Rate‑Limiting Algorithms

Token Bucket

The token bucket has two key elements: tokens and a bucket. Tokens are generated at a fixed rate and stored in the bucket up to a capacity. A request proceeds only if it can acquire a token; otherwise it is queued or dropped.

Token Generation – Tokens are added continuously (e.g., 100 tokens per second) until the bucket is full; excess tokens are discarded.

Token Acquisition – Requests take a token; if none are available, they may be placed in an optional buffer queue, which can be configured with a size limit or priority ordering.

Leaky Bucket

Leaky bucket stores incoming requests in a bucket and releases them at a constant rate, discarding excess requests when the bucket is full. It smooths traffic but cannot handle bursts as well as token bucket.

Leaky vs Token Bucket – Token bucket allows burst handling by pre‑storing tokens; leaky bucket enforces a steady output rate, preventing sudden spikes.

Sliding Window

Counts requests within a moving time window (e.g., last 5 seconds). As the window slides, old counts expire, providing smoother rate limiting over longer intervals.

Common Rate‑Limiting Solutions

Validity‑Check Limiting

Techniques such as CAPTCHAs and IP blacklists prevent abuse and crawling.

Guava RateLimiter

Guava provides RateLimiter for single‑machine limiting; it cannot coordinate across multiple JVMs or servers.

Gateway‑Level Limiting

Gateways (e.g., Nginx, Spring Cloud Gateway, Zuul) filter traffic before it reaches backend services. Nginx supports two main methods:

Rate control using limit_req_zone (e.g., 2 requests/s per IP) with optional burst for bursts.

Connection control using limit_conn_zone and limit_conn (e.g., limit_conn perip 10 , limit_conn perserver 100 ).

Middleware Limiting

Distributed limiting often stores counters in a central store like Redis. Redis’ expiration feature and Lua scripting enable precise rate‑limit logic. Redis‑Cell can implement leaky‑bucket, and Guava can be used for token‑bucket on a single node.

Sentinel

Alibaba’s Sentinel (part of Spring Cloud Alibaba) offers rich rate‑limit APIs and a visual dashboard for governance.

Architectural Design Considerations

Real‑world projects combine multiple limiting methods to form layered protection, applying coarse limits at the gateway and finer controls in middleware or service‑level components.

Specific Implementation Techniques

Tomcat: configure maxThreads in conf/server.xml to queue excess requests.

Nginx: use limit_req_zone with burst for rate limiting.

Nginx: use limit_conn_zone and limit_conn for concurrent connection limits.

Time‑window algorithm via Redis sorted sets.

Leaky bucket via Redis‑Cell.

Token bucket via Guava.

Note: Redis‑based limits work in distributed environments, while Guava is limited to single‑machine scenarios. Container‑level limits (Nginx, Tomcat) can be used without code changes if they meet business requirements.

Tomcat Limiting

Set maxThreads (default 150) in conf/server.xml ; requests exceeding this are queued. Adjust based on server resources, remembering each thread consumes ~1 MB of JVM memory and OS thread limits (≈2000 on Windows, ≈1000 on Linux).

Backenddistributed systemsRate LimitingSliding Windowtoken bucketleaky bucket
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.