15 Proven Strategies to Design High‑Concurrency Systems
This article outlines fifteen practical techniques—including horizontal scaling, microservice decomposition, database sharding, connection pooling, caching, CDN, message queues, Elasticsearch, circuit breaking, rate limiting, and load testing—to help engineers build robust, high‑concurrency systems that can handle massive traffic spikes.
Overview
Designing a high‑concurrency system means guaranteeing overall availability while handling a large number of simultaneous user requests and sudden traffic spikes. The following fifteen techniques are frequently discussed in technical interviews and constitute a practical checklist for building scalable, fault‑tolerant services.
1. Horizontal Scaling (Divide and Conquer)
Deploy multiple stateless instances behind a load balancer (e.g., Nginx, L4 LB). Each node processes a fraction of the traffic, eliminating the single‑point‑of‑failure of a single‑machine deployment and increasing aggregate request‑handling capacity.
2. Microservice Decomposition
Split a monolithic application into independent services based on business domains (e.g., user, order, product). Each service runs in its own process/container, allowing independent scaling, isolated failures, and clearer ownership of resources.
3. Database Sharding and Partitioning
When a single MySQL instance reaches limits on disk, memory, or connections ("too many connections"), distribute data across multiple databases (sharding) and further divide large tables (partitioning). Typical thresholds: 10 M rows per table may trigger partitioning; 10 TB total data often justifies sharding. This reduces per‑node load and keeps query latency low.
4. Connection Pooling
Reuse existing connections instead of creating a new one for each request. Apply pooling to:
Database connections (e.g., HikariCP, DBCP)
HTTP client connections (e.g., Apache HttpClient pool)
Redis clients (e.g., JedisPool)
Thread pools (e.g., java.util.concurrent.ExecutorService) similarly limit thread‑creation overhead and improve parallel task execution.
5. Master‑Slave Replication
A single MySQL master typically handles ~500 TPS and ~10 k QPS. Adding read‑only slaves offloads read‑heavy traffic, protecting the master from overload. Be aware of replication lag (seconds to minutes) and eventual consistency when routing reads.
6. Caching
Store frequently accessed data in memory to reduce backend load and latency. Common caches:
Redis (single‑node can serve tens of thousands of QPS)
Local JVM caches (Caffeine, Guava Cache)
Memcached
Key cache pitfalls to handle:
Cache‑DB consistency
Cache avalanche (mass expiration)
Cache penetration (requests for nonexistent keys)
Cache stampede (thundering herd)
7. CDN for Static Assets
Serve images, CSS, JavaScript, and other static files via a Content Delivery Network. Edge nodes deliver content close to users, reducing latency and offloading backend servers.
8. Message Queues for Traffic Smoothing
Introduce a queue (e.g., Kafka, RabbitMQ, RocketMQ) to buffer bursts. If the application can process 2 k requests/s but receives 5 k, the queue absorbs the excess and releases work at a controlled rate. Overflow policies include dropping messages or returning error responses.
9. Elasticsearch for Search‑Heavy Loads
Use Elasticsearch as a distributed, horizontally scalable search engine. It handles large data volumes and high query concurrency without the need to scale relational databases for search‑specific workloads.
10. Circuit Breaker and Degradation
Wrap downstream calls with a circuit‑breaker (e.g., Hystrix, Resilience4j). When a service becomes slow or fails, the breaker opens, returning a fallback response and preventing cascading failures (service avalanche).
11. Rate Limiting
Protect limited resources (CPU, memory, network) by discarding excess requests during spikes. Implementations:
Guava RateLimiter (local token‑bucket)
Redis‑based distributed token bucket
Alibaba Sentinel (distributed flow control)
12. Asynchronous Processing
Replace synchronous calls with asynchronous workflows, typically via a message queue. For example, a flash‑sale request is placed on a queue, the user receives an immediate "processing" response, and the order is finalized later, freeing threads for new requests.
13. API Optimizations
Reduce payload size (e.g., protobuf, JSON with compression), use efficient serialization, and avoid unnecessary fields. Smaller payloads increase the number of requests that can be served per second.
14. Load Testing to Identify Bottlenecks
Before release, run stress tests with tools such as JMeter or LoadRunner. Measure:
Maximum concurrent users
Response time distribution
Resource utilization (CPU, memory, network, I/O)
Identify whether bottlenecks reside in the network, reverse proxy (Nginx), application code, database, or cache layers, then apply targeted mitigations.
15. Scaling Out and Traffic Switching
For sudden spikes, add more nodes (e.g., extra MySQL or Redis replicas) and optionally shift traffic between data centers or availability zones. Traffic routing can be controlled via DNS, load‑balancer weights, or service‑mesh policies.
References
极客时间高并发系统设计 40 问 – https://time.geekbang.org/column/article/192203
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
