Backend Development 6 min read

Thoughts on High‑Concurrency Traffic Control and Rate‑Limiting Techniques

This article shares practical insights on handling high‑concurrency traffic, explaining what constitutes large traffic, common mitigation strategies such as caching, downgrade, and focusing on rate‑limiting techniques—including counters, sliding windows, leaky‑bucket and token‑bucket algorithms—and demonstrates using Guava’s RateLimiter for Java applications.

Java Captain

Sep 1, 2018

Thoughts on High‑Concurrency Traffic Control and Rate‑Limiting Techniques

Introduction

In real projects the author has encountered peaks of over 50,000 QPS and stress‑test peaks of over 100,000 QPS. This post records personal thoughts on controlling high‑concurrency traffic.

Ideas for Handling Large Traffic

Large traffic is not defined by a fixed number; any request volume that stresses the system and degrades performance can be considered large. Common mitigation methods include: Cache: bring data closer to the program to reduce frequent DB accesses. Downgrade: downgrade non‑core services during spikes. Rate limiting: restrict the number of requests within a time window, similar to limiting passenger flow in a subway during rush hour. When the core write path (e.g., e‑commerce checkout) cannot be downgraded, rate limiting becomes crucial.

Common Rate‑Limiting Approaches

The usual techniques are counters, sliding windows, leaky bucket, and token bucket.

Counter

A simple algorithm that counts requests in a fixed interval and compares the count with a threshold; the counter resets at the interval boundary. A notable issue is the “time‑boundary” problem where a burst of requests arriving exactly at the boundary can overwhelm the system.

Sliding Window

Improves on the counter by dividing time into small slices and moving the window forward, thereby smoothing out the boundary effect. The number of slices determines the precision of the algorithm.

Leaky Bucket

Uses a fixed‑size bucket where incoming requests fill the bucket at an unpredictable rate, while the outflow rate is constant. When the bucket is full, excess requests overflow and are dropped.

Token Bucket

Enhances the leaky bucket by generating tokens at a constant rate; requests can consume tokens without a strict speed limit. This allows short‑term bursts to be handled while still protecting the system from sustained overload.

Rate‑Limiting Tool: Guava RateLimiter

Guava provides a ready‑made API based on the token‑bucket algorithm. By specifying the desired QPS, RateLimiter continuously adds tokens to the bucket, and callers obtain permission via tryAcquire() .

Distributed Rate Limiting (Brief Note)

The discussed techniques apply to single‑machine scenarios. In distributed environments, solutions often combine technologies such as Nginx+Lua or Redis+Lua, but this article focuses on the single‑node case.

One‑line advice: Let traffic queue up and be rate‑limited before it reaches the core system.

If you find this sharing useful, feel free to like or forward.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high concurrency traffic control Guava rate limiting Backend Performance Token Bucket

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.