Backend Development 13 min read

15 Essential Strategies for Designing High‑Concurrency Systems

This article outlines fifteen practical techniques—including horizontal scaling, microservice decomposition, database sharding, connection pooling, caching, CDN, message queues, circuit breaking, rate limiting, and load testing—to help engineers design backend systems that remain reliable and performant under extreme traffic spikes.

ITPUB

Jan 23, 2023

15 Essential Strategies for Designing High‑Concurrency Systems

Understanding High‑Concurrency Systems

Designing a high‑concurrency system means building an application that stays available while handling a massive number of simultaneous user requests and absorbing large traffic bursts, requiring careful mitigation of common bottlenecks such as memory, disk, connection limits, and network bandwidth.

1. Horizontal Scaling (Divide and Conquer)

Deploying a single instance limits the traffic a system can handle and creates a single point of failure; distributing the workload across multiple servers increases overall concurrency and eliminates the single‑point risk.

2. Microservice Decomposition

Breaking a monolithic application into independent services—e.g., separating user, order, and product modules—allows traffic to be spread across services, improving throughput and simplifying scaling.

3. Database Sharding and Partitioning

When a single MySQL instance cannot sustain the load (e.g., "too many connections" errors), splitting data across multiple databases and tables reduces per‑node pressure; tables exceeding millions of rows typically require partitioning to maintain query performance.

4. Connection Pooling

Creating a new database, HTTP, or Redis connection for each request is costly; using connection pools reuses existing connections, dramatically improving request handling speed. Thread pools provide similar benefits for parallel task execution.

5. Master‑Slave Replication

A single MySQL server supports roughly 500 TPS and 10 k QPS; adding read replicas offloads read‑heavy traffic, preserving master capacity for writes and time‑critical operations.

6. Caching

Introducing caches such as Redis, JVM local cache, or Memcached reduces backend load and speeds up responses. However, cache consistency, avalanche, penetration, and stampede must be carefully managed.

7. CDN for Static Assets

Static resources (images, icons, etc.) should be served via a Content Delivery Network, allowing users to fetch content from geographically close edge nodes, reducing origin server load.

8. Message Queues for Traffic Spikes

During events like Double‑11, a queue can absorb bursts (e.g., 5k requests/s) while the application processes a sustainable rate (e.g., 2k requests/s). Queue overflow can be handled by dropping excess requests or returning error pages.

9. Elasticsearch for Search

Elasticsearch provides a distributed, horizontally scalable search engine that handles large data volumes without frequent hardware scaling, making it suitable for high‑concurrency query workloads.

10. Circuit Breaking and Degradation

When a downstream service fails (e.g., slow SQL), the failure propagates upstream, potentially causing a cascade (service avalanche). Implementing circuit breakers—using switches or libraries like Hystrix —prevents full system collapse.

11. Rate Limiting

To protect limited CPU, memory, network, and thread resources, rate limiting discards excess requests during traffic spikes. Implementations include Guava RateLimiter for single‑node limits, Redis‑based distributed limits, or Alibaba’s Sentinel.

12. Asynchronous Processing

Asynchronous calls avoid blocking the caller, improving overall throughput. Message queues can buffer massive requests (e.g., flash‑sale spikes), allowing the system to acknowledge receipt quickly and process results later.

13. API Optimization Techniques

Optimizing API performance—through compression, pagination, efficient serialization, and other tactics—enables the system to serve more requests in the same time window.

14. Load Testing to Identify Bottlenecks

Before release, conduct load tests with tools such as LoadRunner or JMeter to determine maximum concurrent capacity and pinpoint bottlenecks across network, Nginx, services, or caches.

15. Scaling and Traffic Switching

For sudden traffic peaks, combine horizontal scaling (adding MySQL/Redis replicas) with traffic routing across multiple data centers to distribute load.

References

GeekTime "High‑Concurrency System Design 40 Questions" – https://time.geekbang.org/column/article/192203

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Scalability System Design

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.