Backend Development 6 min read

Understanding High Concurrency: TPS Thresholds and Architectural Solutions

The article explains what high concurrency means, defines TPS thresholds for moderate, high, and ultra‑high traffic, and outlines key backend techniques such as distributed and microservice architectures, caching, load balancing, traffic shaping, and rate‑limiting/circuit‑breaking to handle massive request volumes.

Mike Chen's Internet Architecture

Sep 19, 2024

Understanding High Concurrency: TPS Thresholds and Architectural Solutions

High concurrency refers to a system's ability to process a large number of requests or tasks simultaneously, such as during massive sales events where thousands of users place orders at the same moment.

The primary metric for measuring concurrency is TPS (Transactions Per Second). TPS values between 1,000‑5,000 indicate moderate concurrency, above 5,000 is considered high, and exceeding 50,000 (e.g., Alibaba's 600,000 TPS during Double 11) represents ultra‑high concurrency.

To support such loads, several backend techniques are recommended:

Distributed Architecture – Spreads load across multiple servers, improving scalability and reliability.

Microservice Architecture – Breaks a monolithic application into independent services (e.g., Spring Cloud, Spring Cloud Alibaba) that can be scaled horizontally.

Caching – Uses distributed caches like Redis (and local caches such as Guava) to reduce database pressure by storing hot data in memory.

Load Balancing – Distributes requests across servers using tools like Nginx, HAProxy, LVS, or F5, employing strategies such as round‑robin, weighted, or IP‑hash.

Traffic Shaping (Peak‑Smoothing) – Queues incoming requests in message queues (Kafka, RocketMQ) to smooth bursts and prevent overload.

Rate Limiting and Circuit Breaking – Protects the system during spikes using algorithms like token‑bucket or leaky‑bucket (implemented via Nginx, Guava RateLimiter, Sentinel) and circuit‑breaker mechanisms to isolate failing services.

These combined techniques significantly improve performance and stability under high‑concurrency scenarios.

The article also promotes the author’s extensive resources, including a 300,000‑word Alibaba architecture collection and a comprehensive Java interview question set, which readers can obtain by following the provided links or contacting the author.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed architecture load balancing Caching High concurrency rate limiting TPS

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.