Backend Development 18 min read

Comprehensive Guide to Rate Limiting: Concepts, Common Algorithms, and Practical Implementation Strategies

This article explains the fundamental concepts of rate limiting, reviews popular algorithms such as token bucket, leaky bucket, and sliding window, and details practical implementation methods across single‑machine and distributed environments using tools like Guava, Nginx, Redis, and Sentinel.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Rate Limiting: Concepts, Common Algorithms, and Practical Implementation Strategies

Author: liuec1002 Source: blog.csdn.net/liuerchong/article/details/118882053

Rate Limiting Basic Concepts

Rate limiting typically involves two dimensions: time (a time window such as per second or per minute) and resources (e.g., maximum request count or concurrent connections). Combining these dimensions, rate limiting restricts resource access within a specific time window.

Time: limits based on a time window.

Resource: limits based on available resources such as request count or connection count.

In real scenarios multiple rules work together, e.g., per‑IP QPS <10, connections <5, per‑server QPS ≤1000, total connections ≤200, etc.

QPS and Connection Control

IP‑level and server‑level limits can be combined; a whole server group or data center can also be treated as a single entity for high‑level limits.

Transmission Rate

Different user groups may have different download speeds, e.g., 100 KB/s for regular users and 10 MB/s for premium users, implemented via user‑group based rate limiting.

Black/White Lists

Dynamic blacklists block abusive IPs, while whitelists grant privileged access to trusted accounts or services.

Distributed Environment

Two common distributed rate‑limiting approaches:

Gateway‑level limiting (apply rules at the entry point).

Middleware‑level limiting (store limits in a shared component such as Redis).

Sentinel (Spring Cloud Alibaba component) for distributed limiting and circuit breaking.

Common Rate‑Limiting Algorithms

Token Bucket Algorithm

The token bucket uses two key elements:

Token: a request must acquire a token to be processed.

Bucket: stores tokens with a fixed capacity.

Token Generation – Tokens are added to the bucket at a steady rate (e.g., 100 tokens per second). If the bucket is full, excess tokens are discarded.

Token Acquisition – A request proceeds only after obtaining a token. If tokens are exhausted, requests may be queued (optional buffer queue) or dropped.

Leaky Bucket Algorithm

Requests are placed into a bucket and are emitted at a constant rate, regardless of the incoming burst, ensuring a steady outflow.

Difference from Token Bucket – Token bucket allows bursts by pre‑storing tokens; leaky bucket smooths traffic by enforcing a fixed outflow rate.

Sliding Window

Counts requests in a moving time window, providing smoother limiting as the window length increases.

Typical Rate‑Limiting Solutions

Legality Verification

CAPTCHA, IP blacklists, etc., to block malicious traffic.

Guava RateLimiter

Guava provides RateLimiter for single‑machine limiting. Example:

RateLimiter limiter = RateLimiter.create(100); // 100 permits per second

It cannot coordinate across multiple JVMs or servers.

Gateway‑Level Limiting

Using Nginx or Spring Cloud Gateway to filter traffic before it reaches backend services.

Rate control: limit_req_zone and burst .

Connection control: limit_conn_zone and limit_conn .

Middleware‑Level Limiting

Store counters in Redis (or Redis‑Cell) and use Lua scripts for atomic operations.

Sentinel

Alibaba's open‑source component integrated in Spring Cloud Alibaba, offering rich APIs and a visual console for rate limiting.

Architectural Considerations for Rate Limiting

In production, multiple limiting mechanisms are combined to achieve layered protection, from gateway to middleware to service‑level controls.

Concrete Implementation Techniques

Tomcat: configure maxThreads in conf/server.xml to limit concurrent requests.

Nginx: use limit_req_zone with burst for rate limiting.

Nginx: use limit_conn_zone and limit_conn for concurrent connection limiting.

Time‑window algorithm implemented with Redis sorted sets.

Leaky bucket via Redis‑Cell.

Token bucket via Guava RateLimiter .

Note: Redis‑based limits work in distributed systems, while Guava limits are limited to a single machine.

Tomcat Limiting Details

Set maxThreads in conf/server.xml . Default is 150; adjust based on server resources. Each thread consumes ~1 MB of JVM memory.

Operating system limits: Windows ~2000 threads per process, Linux ~1000 threads per process.

For more resources, see the linked articles and the author’s community groups.

backenddistributed systemsRedisNginxRate Limitingtoken bucket
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.