Backend Development 19 min read

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

This article provides a comprehensive overview of rate limiting, covering basic concepts, common algorithms such as token bucket, leaky bucket, and sliding window, and practical implementation methods across Nginx, Tomcat, Guava, Redis, and Sentinel for distributed backend systems.

Top Architect

Mar 22, 2023

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

Article Directory

Basic concepts of rate limiting

Common algorithms used in rate limiting

Typical rate‑limiting solutions

Architectural considerations for rate limiting

Specific implementation techniques

Basic Concepts

Rate limiting usually involves two dimensions: time (a time window such as per second or per minute) and resources (maximum request count or concurrent connections). Combining these dimensions, a rule may limit, for example, 100 requests per second.

QPS and Connection Control

Limits can be applied per IP, per server, or per server group, allowing multiple rules to work together (e.g., IP QPS < 10, connections < 5, server QPS < 1000, connections < 200).

Transmission Rate

Different user groups may have different download speeds, e.g., regular users 100 KB/s, premium users 10 MB/s, which is another form of rate limiting.

Black/White List

IP addresses that exceed a threshold can be added to a blacklist ("blocking IP"), while trusted accounts can be placed on a whitelist to bypass limits.

Distributed Environment

In a distributed setup, the limit applies to the whole cluster, not just a single node. Centralized storage (e.g., Redis) is needed to share counters across nodes.

Common Algorithms

Token Bucket

The token bucket has two key elements: a bucket that holds tokens and a token generator that adds tokens at a fixed rate. A request can proceed only if it obtains a token; otherwise it is queued or dropped. Optional buffering queues can hold excess requests until new tokens appear.

Leaky Bucket

Requests are placed into a bucket and leak out at a constant rate, ensuring a steady output regardless of bursty input. If the bucket is full, new requests are discarded.

Sliding Window

A sliding window counts requests within the last *N* seconds; as the window moves forward, old counts expire, providing smoother throttling for varying traffic patterns.

Typical Rate‑Limiting Solutions

Legal Verification

CAPTCHA, IP blacklists, and other verification methods help block malicious traffic.

Guava RateLimiter

Guava provides RateLimiter for single‑machine throttling. Example: two servers Server 1 and Server 2 each limited to ≤10 RPS to keep the combined traffic ≤20 RPS.

Gateway‑Level Limiting

Gateways (e.g., Nginx, Spring Cloud Gateway, Zuul) act as the first entry point. Traffic is first filtered here before reaching backend services.

Nginx Rate Limiting

Use limit_req_zone to define a rate‑limit zone and limit_req to enforce it. Example configuration limits an IP to 2 requests per second (500 ms per request). The burst=4 parameter allows up to four bursty requests.

Concurrency limiting uses limit_conn_zone and limit_conn. Example: limit_conn perip 10 restricts a single IP to 10 simultaneous connections, while limit_conn perserver 100 caps total concurrent connections per server.

Note: The connection is counted only after the request header is processed by the backend.

Middleware Limiting

Store counters in a central component such as Redis. Redis' expiration and Lua scripting enable precise distributed throttling. Redis sorted sets can implement sliding windows; Redis‑Cell can realize leaky‑bucket behavior.

Sentinel

Alibaba's Sentinel offers rich APIs and a visual console for flow control, circuit breaking, and isolation.

Architectural Design Considerations

Real‑world projects often combine multiple techniques (gateway, middleware, component‑level limits) to achieve layered protection and optimal resource utilization.

Specific Implementation Techniques

Tomcat: configure maxThreads in conf/server.xml to limit concurrent requests; excess requests are queued.

Nginx: use limit_req_zone with burst for rate limiting and limit_conn_zone / limit_conn for concurrency control.

Redis: implement time‑window algorithms with sorted sets; use Redis‑Cell for leaky bucket; use Lua scripts for atomic counter updates.

Guava: apply RateLimiter for single‑node throttling.

Redis‑based limits work in distributed systems, while Guava limits are limited to a single JVM.

Tomcat Specifics

Default maxThreads is 150 (Tomcat 8.5). Increasing the value consumes more JVM memory (≈1 MB per thread) and raises GC pressure. OS limits also apply (≈2000 threads on Windows, ≈1000 on Linux).

Important Notes

Guava cannot coordinate across multiple machines.

Container‑level limits (Nginx, Tomcat) are simple but must meet business requirements.

For further reading, the original article is sourced from CSDN .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm redis rate limiting

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.