Backend Development 17 min read

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

This article explains the fundamental concepts of rate limiting, compares common algorithms such as token bucket, leaky bucket and sliding window, and details practical implementations using Nginx, Tomcat, Redis, Guava, and Sentinel for both single‑node and distributed backend systems.

Selected Java Interview Questions

Nov 15, 2022

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

Basic Concepts of Rate Limiting

QPS and Connection Control

Transmission Rate

Blacklist / Whitelist

Distributed Environment

Common Rate‑Limiting Algorithms

Token Bucket

Leaky Bucket

Sliding Window

Typical Rate‑Limiting Solutions

Legality Verification (CAPTCHA, IP blacklist)

Guava RateLimiter Gateway‑Level Limiting

Middleware Limiting (Redis)

Sentinel Component

Architectural Design Considerations

Specific Implementation Techniques

Basic Concepts of Rate Limiting

Rate limiting is usually described by two dimensions: a time window (e.g., per second, per minute) and a resource limit (e.g., maximum request count or concurrent connections). Combining these dimensions, a rule such as "no more than 100 requests per second" can be enforced, and multiple rules can coexist.

QPS and Connection Control

Limits can be applied per IP, per server, or per server group, allowing rules like "each IP < 10 QPS, connections < 5" and "each machine QPS ≤ 1000, connections ≤ 200" to work together.

Transmission Rate

Different user tiers may receive different download speeds (e.g., 100 KB/s for regular users, 10 MB/s for premium members), which is another form of rate limiting based on user groups.

Blacklist / Whitelist

Dynamic blacklists block IPs that exceed request thresholds, while whitelists grant privileged accounts unrestricted access.

Distributed Environment

In a cluster, rate‑limiting data should be stored centrally so that every node shares the same limits. Typical approaches include gateway‑level limiting, middleware‑level limiting (e.g., Redis), and using components like Sentinel.

Common Rate‑Limiting Algorithms

Token Bucket

The token bucket algorithm uses two key elements: a bucket that holds tokens and a token generator that refills the bucket at a fixed rate. A request can proceed only if it obtains a token; otherwise it is queued or dropped. The bucket has a finite capacity, and excess tokens are discarded.

Leaky Bucket

Leaky bucket stores incoming requests in a bucket and releases them at a constant rate, regardless of the arrival burst. If the bucket is full, new requests are dropped, guaranteeing a steady outflow.

Sliding Window

A sliding window counts requests within the most recent time interval (e.g., the last 5 seconds). When the window moves forward, old counts expire, providing smoother throttling for variable traffic patterns.

Typical Rate‑Limiting Solutions

Legality Verification

CAPTCHA, IP blacklists, and similar techniques prevent malicious bots and crawlers.

Guava RateLimiter

Guava provides a simple client‑side limiter for a single JVM. Example servers: Server 1, Server 2 – each can be limited to ≤10 QPS, but the total across both servers would still be 20 QPS if coordinated.

Gateway‑Level Limiting

Placing limits at the entry point (e.g., Nginx, Spring Cloud Gateway, Zuul) filters traffic before it reaches backend services.

Middleware Limiting (Redis)

Redis can store counters with expiration or run Lua scripts to implement token‑bucket, leaky‑bucket, or sliding‑window logic across a distributed cluster.

Sentinel Component

Sentinel, an open‑source Alibaba project, offers rich APIs and a visual console for rate limiting, circuit breaking, and degradation.

Architectural Design Considerations

In real projects, multiple limiting mechanisms are combined to form a layered defense, from coarse gateway limits to fine‑grained middleware or component limits, ensuring high resource utilization while protecting services.

Specific Implementation Techniques

Tomcat: set maxThreads in conf/server.xml to limit concurrent requests.

Nginx rate limiting: use limit_req_zone with burst=4 for burst handling.

Nginx connection limiting: use limit_conn_zone and limit_conn (e.g., limit_conn perip 10, limit_conn perserver 100).

Redis sorted‑set for sliding‑window algorithm.

Redis‑Cell for leaky‑bucket implementation.

Guava RateLimiter for single‑node token bucket.

Note: Redis‑based limits work in distributed systems, while Guava limits are limited to a single JVM.

When a project cannot modify code, container‑level limiting (Nginx or Tomcat) can be applied directly, provided it satisfies the business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems algorithm rate limiting

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Table of Contents

Basic Concepts of Rate Limiting

QPS and Connection Control

Transmission Rate

Blacklist / Whitelist

Distributed Environment

Common Rate‑Limiting Algorithms

Token Bucket

Leaky Bucket

Sliding Window

Typical Rate‑Limiting Solutions

Legality Verification

Guava RateLimiter

Gateway‑Level Limiting

Middleware Limiting (Redis)

Sentinel Component

Architectural Design Considerations

Specific Implementation Techniques

Selected Java Interview Questions

How this landed with the community

Was this worth your time?

0 Comments