15 Essential Strategies for Designing High‑Concurrency Systems
This article outlines fifteen practical techniques—including horizontal scaling, microservice decomposition, database sharding, connection pooling, caching, CDN, message queues, circuit breaking, rate limiting, and load testing—to help engineers design backend systems that remain reliable and performant under extreme traffic spikes.
Understanding High‑Concurrency Systems
Designing a high‑concurrency system means building an application that stays available while handling a massive number of simultaneous user requests and absorbing large traffic bursts, requiring careful mitigation of common bottlenecks such as memory, disk, connection limits, and network bandwidth.
1. Horizontal Scaling (Divide and Conquer)
Deploying a single instance limits the traffic a system can handle and creates a single point of failure; distributing the workload across multiple servers increases overall concurrency and eliminates the single‑point risk.
2. Microservice Decomposition
Breaking a monolithic application into independent services—e.g., separating user, order, and product modules—allows traffic to be spread across services, improving throughput and simplifying scaling.
3. Database Sharding and Partitioning
When a single MySQL instance cannot sustain the load (e.g., "too many connections" errors), splitting data across multiple databases and tables reduces per‑node pressure; tables exceeding millions of rows typically require partitioning to maintain query performance.
4. Connection Pooling
Creating a new database, HTTP, or Redis connection for each request is costly; using connection pools reuses existing connections, dramatically improving request handling speed. Thread pools provide similar benefits for parallel task execution.
5. Master‑Slave Replication
A single MySQL server supports roughly 500 TPS and 10 k QPS; adding read replicas offloads read‑heavy traffic, preserving master capacity for writes and time‑critical operations.
6. Caching
Introducing caches such as Redis, JVM local cache, or Memcached reduces backend load and speeds up responses. However, cache consistency, avalanche, penetration, and stampede must be carefully managed.
7. CDN for Static Assets
Static resources (images, icons, etc.) should be served via a Content Delivery Network, allowing users to fetch content from geographically close edge nodes, reducing origin server load.
8. Message Queues for Traffic Spikes
During events like Double‑11, a queue can absorb bursts (e.g., 5k requests/s) while the application processes a sustainable rate (e.g., 2k requests/s). Queue overflow can be handled by dropping excess requests or returning error pages.
9. Elasticsearch for Search
Elasticsearch provides a distributed, horizontally scalable search engine that handles large data volumes without frequent hardware scaling, making it suitable for high‑concurrency query workloads.
10. Circuit Breaking and Degradation
When a downstream service fails (e.g., slow SQL), the failure propagates upstream, potentially causing a cascade (service avalanche). Implementing circuit breakers—using switches or libraries like Hystrix —prevents full system collapse.
11. Rate Limiting
To protect limited CPU, memory, network, and thread resources, rate limiting discards excess requests during traffic spikes. Implementations include Guava RateLimiter for single‑node limits, Redis‑based distributed limits, or Alibaba’s Sentinel.
12. Asynchronous Processing
Asynchronous calls avoid blocking the caller, improving overall throughput. Message queues can buffer massive requests (e.g., flash‑sale spikes), allowing the system to acknowledge receipt quickly and process results later.
13. API Optimization Techniques
Optimizing API performance—through compression, pagination, efficient serialization, and other tactics—enables the system to serve more requests in the same time window.
14. Load Testing to Identify Bottlenecks
Before release, conduct load tests with tools such as LoadRunner or JMeter to determine maximum concurrent capacity and pinpoint bottlenecks across network, Nginx, services, or caches.
15. Scaling and Traffic Switching
For sudden traffic peaks, combine horizontal scaling (adding MySQL/Redis replicas) with traffic routing across multiple data centers to distribute load.
References
GeekTime "High‑Concurrency System Design 40 Questions" – https://time.geekbang.org/column/article/192203
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
