What QPS Defines High Concurrency? Strategies & Architecture Explained

This article defines high concurrency, outlines QPS thresholds for small, medium, high, and ultra‑high traffic, and presents practical solutions such as multi‑level caching, load balancing, database sharding, and message‑queue traffic shaping to build robust backend systems.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
What QPS Defines High Concurrency? Strategies & Architecture Explained

What is High Concurrency

High concurrency refers to a system's ability to handle a large number of user requests within a unit of time, typically measured by QPS (queries per second).

Typical high‑concurrency scenarios depend on business needs, e.g., Alibaba's Double 11 peak of 583,000 orders per second.

QPS Thresholds for High Concurrency

QPS is used to judge concurrency level. It can be divided into:

Small scale : QPS < 100 – low concurrency.

Medium scale : 100 ≤ QPS ≤ 1,000 – requires some optimization and distributed architecture.

High concurrency : 1,000 ≤ QPS ≤ 10,000 – needs caching, asynchronous processing, database sharding, etc.

Ultra‑high concurrency : QPS > 10,000, even >100,000 – typical for large internet platforms.

In practice, QPS over 1,000 is considered high concurrency; over 10,000 is ultra‑high.

Multi‑Level Caching

Cache reduces direct database access, speeding up responses.

Local cache : In‑process memory cache.

Distributed cache : Redis, Memcached, etc., shared across services.

CDN cache : Global nodes cache static resources.

Load Balancing

Distributes user requests across multiple servers to relieve pressure on any single server.

Hardware load balancers : High‑performance devices (e.g., F5, A10) for extreme traffic.

Software load balancers : Open‑source solutions such as Nginx, HAProxy, Apache.

Database Sharding (分库分表)

Splits data across multiple databases or tables to reduce load on a single database.

Vertical sharding : Separate databases by business module (e.g., users vs. orders).

Horizontal sharding : Partition tables by range (e.g., user ID, order ID).

Message Middleware for Traffic Shaping

Message queues such as Kafka, RabbitMQ, RocketMQ can decouple services and smooth traffic spikes.

These architectural techniques are often combined to handle high‑concurrency scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architectureshardingload balancingcachinghigh concurrencyQPS
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.