Backend Development 9 min read

Understanding Service Rate Limiting: Concepts, Algorithms, and Distributed Implementation

The article explains why service rate limiting is essential for high‑concurrency systems, compares it with service degradation, describes common algorithms such as counter, leaky bucket and token bucket, and shows how to implement distributed rate limiting using Redis.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Understanding Service Rate Limiting: Concepts, Algorithms, and Distributed Implementation

When building high‑concurrency systems, three main techniques are used to keep services stable: service degradation, rate limiting, and caching. This article focuses on service rate limiting, distinguishing it from service degradation.

Service degradation is the practice of disabling certain interfaces or pages under sudden server pressure to free resources for core tasks.
Rate limiting reduces the traffic volume while keeping the service's processing capacity unchanged; unlike degradation, it does not lower the service's capability.

The article clarifies that the discussed rate limiting is implemented in business code rather than at the Nginx layer.

Why Do We Need Service Rate Limiting?

From the caller’s perspective, services can be categorized as user‑facing services (web services, external APIs) that may face rapid user growth, hot events, malicious attacks, or crawlers, and internal RPC services where a surge in one service can overload dependent services.

Common Rate‑Limiting Algorithms

The article reviews three widely used algorithms: counter algorithm, leaky bucket algorithm, and token bucket algorithm.

Counter Algorithm

The simplest approach counts the number of requests within a time window and rejects requests that exceed a predefined threshold (e.g., 100 QPS). It suffers from burst problems at the window boundaries, which can be mitigated by using a sliding window technique that divides the large window into smaller sub‑windows.

Leaky Bucket Algorithm

This classic algorithm models a bucket that leaks at a constant rate; excess requests overflow and are rejected. It ensures a steady processing rate but cannot handle short‑term traffic spikes.

Token Bucket Algorithm

A bucket holds a fixed number of tokens that are replenished at a constant rate. Each request must acquire a token; if none are available, the request is delayed or rejected. When the bucket is full, excess tokens are discarded, allowing bursts up to the current token count. This algorithm combines the steady rate of the leaky bucket with the ability to absorb short bursts.

Implementation can involve a queue storing tokens and a thread pool that periodically generates tokens; each incoming request consumes a token before proceeding.

Distributed Rate Limiting

Single‑machine rate limiting cannot satisfy complex business requirements such as limiting a user to three accesses per 10 seconds or 100 accesses per day. Distributed rate limiting solves this by using a shared counter, typically backed by Redis, which offers native support for distributed operations.

Each request increments a Redis key (e.g., composed of user ID and API name) with an expiration time, allowing simple enforcement of access limits across a cluster.

Backenddistributed systemsRedisservice degradationRate Limiting
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.