Backend Development 17 min read

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

This article provides a thorough overview of rate limiting, covering its basic concepts, common algorithms such as token bucket, leaky bucket and sliding window, and practical implementation methods across Nginx, Tomcat, Guava, Redis, Sentinel and other middleware for both single‑machine and distributed systems.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementation Strategies

Rate Limiting Basics

Rate limiting restricts resource access by applying limits over a time window or based on resource capacity, such as QPS, connection count, transmission speed, black/white lists, and distributed environments.

QPS and Connection Control

Limits can be set per IP, per server, or globally; multiple rules can coexist, e.g., each IP max 10 requests/s and each server max 1000 QPS .

Transmission Speed

Different user groups may receive different download speeds, e.g., regular users 100 KB/s versus premium users 10 MB/s.

Black/White Lists

Dynamic blacklists block abusive IPs, while whitelists grant unrestricted access to trusted accounts.

Distributed Environment

In a cluster, rate‑limit rules apply across all nodes; a centralized store (e.g., Redis) is needed to share counters.

Common Rate‑Limiting Algorithms

Token Bucket

A bucket holds a configurable number of tokens generated at a steady rate; each request must acquire a token, optionally queuing in a buffer when tokens are exhausted.

Leaky Bucket

Requests are placed in a bucket and drained at a constant rate, smoothing bursts and guaranteeing a fixed output rate.

Sliding Window

Counts requests within a moving time window, providing smoother limits as the window slides forward.

Typical Rate‑Limiting Solutions

Legality Verification

CAPTCHA, IP blacklists, and other checks prevent malicious traffic.

Guava RateLimiter

Guava offers RateLimiter for single‑machine throttling; it cannot coordinate across multiple servers.

Gateway‑Level Limiting

Nginx can limit request rate with limit_req_zone and burst settings, and limit concurrent connections with limit_conn_zone / limit_conn .

Middleware Limiting

Redis can store counters with expiration; Lua scripts or Redis‑Cell implement token‑bucket or leaky‑bucket logic for distributed throttling.

Sentinel

Alibaba's Sentinel provides rich rate‑limiting APIs and a visual console, suitable for Spring Cloud micro‑services.

Architecture‑Level Design

Real projects combine multiple techniques (gateway, middleware, component‑level limits) to achieve layered protection and high resource utilization.

Specific Implementation Examples

Tomcat: configure maxThreads in conf/server.xml to cap concurrent requests.

Nginx: use limit_req_zone with burst for rate limiting, and limit_conn_zone / limit_conn for concurrent connections.

Redis: use sorted sets for sliding‑window counters, Redis‑Cell for leaky‑bucket, and Lua scripts for token‑bucket.

Guava: RateLimiter for single‑node throttling.

When deploying, consider OS thread limits (e.g., Linux ~1000 threads per process) and adjust parameters accordingly.

backenddistributed systemsmiddlewareRedisNginxRate Limitingtoken bucket
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.