Backend Development 16 min read

Rate Limiting in Microservices: Why It’s Needed and Common Techniques

Rate limiting is essential for microservice resilience, preventing overloads and protecting business continuity, and can be implemented through various methods such as semaphores, thread‑pool isolation, fixed or sliding windows (using Redis ZSets or local memory), token‑bucket/leaky‑bucket algorithms, each with trade‑offs in accuracy, performance, clock synchronization, and deployment location.

Tencent Cloud Developer

May 23, 2019

Rate Limiting in Microservices: Why It’s Needed and Common Techniques

In the previous article "The Architect’s Journey – Service Governance Overview", we discussed high‑availability governance. In complex microservice topologies, rate limiting is a key measure to ensure service elasticity and topology robustness.

Consider a flash‑sale event without any rate‑limiting measures, or an account platform that does not allocate traffic quotas to dozens of business units. Such scenarios can cause massive business loss and damage to reputation.

Product teams often focus on forward and reverse functional delivery, while neglecting reverse technical safeguards. When traffic grows rapidly, the lack of rate limiting becomes a serious hidden risk.

Not every system needs rate limiting; it depends on the architect’s forecast of business growth.

2.1 Semaphore Counting

Semaphore competition is a common way to control concurrency. Languages such as C and Java provide Semaphore implementations. The Hystrix framework uses semaphores for isolation and concurrency control. It is simple and reliable but only works in a single‑machine environment.

2.2 Thread‑Pool Isolation

Thread‑pool isolation limits the number of threads used for processing requests. A blocking queue is often combined with the pool. When the pool and queue are saturated, a rejection strategy must be designed. Like semaphores, this approach is limited to single‑machine deployments.

2.3 Fixed‑Window Counting

Counting starts from the first request and does not strictly follow calendar time. A typical implementation uses Redis INCR and EXPIRE:

count = redis.incrby(key)</code><code>if count == 1</code><code>    redis.expire(key, 3600)</code><code>if count >= threshold</code><code>    println("exceed...")

This method is simple and solves most distributed rate‑limiting problems, but it has drawbacks:

Inaccurate counting because the window expires and previous data is lost, leading to potential under‑ or over‑limiting.

Every request hits Redis, which can become a bottleneck under high concurrency.

2.4 Natural‑Window Counting

Natural‑window counting uses time‑bucketed keys (e.g., one day, one second) to allocate quotas. It suffers from the same accuracy issues as fixed‑window counting and therefore inherits its problems.

2.5 Sliding‑Window Counting

Sliding windows provide accurate counting by continuously moving the time window. Two common implementations are:

2.5.1 Based on Shared Distributed Memory (Redis ZSet)

Store request IDs in a Redis sorted set (ZSet) where the score is the timestamp. The process consists of adding a record, setting expiration, counting, and removing expired entries.

// enable pipeline</code><code>pipeline = redis.pipelined()</code><code>// add a request</code><code>pipeline.zadd(key, getUUID(), now)</code><code>// reset expiration</code><code>pipeline.expire(key, 3600)</code><code>// count requests in the sliding window</code><code>count = pipeline.zcount(key, expireTimeStamp, now)</code><code>// delete expired records</code><code>pipeline.zremrangeByScore(key, 0, expireTimeStamp - 1)</code><code>pipeline.sync()</code><code>if count >= threshold</code><code>    println("exceed")

This approach can become a performance bottleneck under high QPS because each request performs multiple Redis operations.

2.5.2 Based on Local Memory

Local‑memory solutions avoid the heavy Redis load. Two typical patterns are:

Using Storm with fieldsGroup to route keys to specific bolts for processing.

Using RPC load‑balancing with consistent hashing to route requests to instances that perform in‑process counting.

Processing can be implemented with:

Esper DSL for sliding‑window expressions.

Storm’s built‑in sliding‑window support (since 1.0).

Custom implementation using a circular queue combined with natural‑window counting.

2.6 Token‑Bucket and Leaky‑Bucket Algorithms

Token‑bucket handles burst traffic, while leaky‑bucket smooths traffic flow. Both are classic rate‑limiting algorithms. In single‑machine scenarios they are common; in distributed environments they are rarely used directly.

Guava’s RateLimiter implements a token‑bucket. Guava combines token‑bucket with leaky‑bucket to provide “burst‑allowance” while protecting against uncontrolled spikes.

3.1 Clock Skew or Backward Time

Inconsistent machine clocks can corrupt rate‑limiting calculations. Operations teams must keep clocks synchronized and detect large drifts or rollbacks, excluding affected nodes from calculations.

3.2 SDK vs. Server‑Side Rate Limiting

Choosing where to place the rate‑limiting logic depends on update frequency, integration cost, language heterogeneity, and the need for cross‑system coordination. SDKs are common but introduce challenges; server‑side implementations (e.g., Istio’s Mixer) face performance issues.

3.3 Impact on System Controllability

Extensive rate limiting across a complex topology can make scaling and feature rollout risky, potentially causing cascading failures.

3.4 Topology‑Related Performance

Understanding where to place limits (upstream vs. downstream) can improve user experience and reduce cascade pressure.

3.5 Accuracy vs. Real‑Time Trade‑off

Accurate, real‑time sliding‑window implementations (e.g., Redis ZSet) provide precision but may degrade performance. Trade‑offs include simplifying to fixed windows for speed or decoupling calculation and enforcement into asynchronous stages.

Summary

Rate limiting is a core component of high‑availability governance. Numerous techniques exist, each with its own trade‑offs. As Service Mesh, AIOps, and related technologies evolve, the ways we implement and think about rate limiting will continue to expand.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm Microservices rate limiting

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.