Backend Development 9 min read

How to Optimize High‑Concurrency Services (QPS > 200k)

This article outlines practical strategies for handling online services with extremely high request rates—over 200,000 QPS—by avoiding relational databases, employing multi‑level caching, leveraging multithreading, implementing circuit‑breaker and downgrade mechanisms, optimizing I/O, controlling retries, handling edge cases, and logging efficiently.

Cloud Native Technology Community

Feb 16, 2023

How to Optimize High‑Concurrency Services (QPS > 200k)

1: Say No to Relational Databases

A large‑scale C‑end internet service should not rely on relational databases as the primary storage; instead, use NoSQL caches like Redis or Memcached as the main "database" and keep MySQL or Oracle only as an asynchronous backup.

Example: During JD.com’s Double‑11 event, product data is first written to Redis and later asynchronously persisted to MySQL. C‑end queries read directly from Redis, while B‑end queries can still use the database.

2: Multi‑Level Caching

Caching is essential for high concurrency. Redis can handle 60‑80k QPS per node, but its single‑threaded nature and hotspot issues require additional layers.

Typical multi‑level cache stack: local in‑process cache → MemeryCache (multithreaded) → Redis. This hierarchy can absorb millions of QPS in flash‑sale scenarios.

3: Multithreading

Switching from a synchronous loop that reads Redis (≈3 ms per call) over a 300‑400k list to a thread‑pool implementation reduced response time from >30 s to about 3 s, demonstrating the power of multithreading on multi‑core servers.

However, thread pool size and queue length must be tuned and monitored to avoid resource waste.

4: Degradation and Circuit‑Breaker

Both mechanisms protect services from overload. Degradation disables non‑essential features while keeping the main flow alive; circuit‑breaker cuts off calls to an overloaded downstream service and returns failures immediately.

Choosing between them depends on business scenarios.

5: I/O Optimization

Frequent connection creation and teardown increase I/O load. Batch requests whenever possible to reduce the number of downstream calls, especially in high‑traffic product detail queries.

6: Use Retries Wisely

Retry can mitigate transient failures but must be limited in count, spaced appropriately, and configurable; otherwise it can cause cascading failures (e.g., Kafka consumer lag caused by excessive retries).

Control retry count

Set proper retry intervals

Make retry behavior configurable

7: Guard Edge Cases and Provide Fallbacks

Missing checks for edge cases (e.g., empty arrays) can lead to massive data leaks affecting millions of users; simple validation can prevent catastrophic incidents.

8: Log Elegantly

Full‑volume logging at 200k QPS can consume terabytes of disk space daily and increase response latency. Use rate‑limited logging (token bucket) or whitelist‑based logging to reduce noise and resource consumption.

Conclusion

The blog summarizes essential considerations for high‑concurrency services, offering practical advice to maintain reliability and performance while acknowledging that real‑world scenarios can be more complex.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization High concurrency multithreading circuit breaker

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.