Operations 11 min read

How to Scale Services Beyond 200k QPS: Practical Strategies for High‑Concurrency Systems

This article explores practical techniques for optimizing online services handling over 200,000 QPS, covering why relational databases fall short, multi‑level caching, multithreading, circuit breaking, I/O reduction, controlled retries, edge‑case handling, and efficient logging to maintain sub‑300 ms response times.

MaGe Linux Operations

Feb 7, 2023

How to Scale Services Beyond 200k QPS: Practical Strategies for High‑Concurrency Systems

How to optimize high‑concurrency services, specifically online services with QPS exceeding 200,000, noting that offline services are not considered; what challenges do online services face?

Cannot use offline caching; all data must be read in real time.

A massive number of requests hit the online service, requiring high response speed, typically constrained to within 300 ms; exceeding this dramatically degrades user experience.

Data volume is large; if a single request exceeds 500,000 QPS with each record 1 KB, that is 5 GB per second, 30 GB per minute, imposing huge pressure on underlying storage and queries.

How to address these tough problems is discussed in this blog.

1. Say No to Relational Databases

A truly large‑scale internet service for end‑users never uses a relational database as its primary storage, regardless of sharding or connection‑pool optimizations; MySQL/Oracle have inherent disadvantages under massive online load, and even heavy tuning cannot withstand traffic above 500k QPS. The solution is to adopt NoSQL cache systems such as Redis or MemCache as the primary "database", while relational databases serve only as asynchronous backup for data queries.

Example: During JD.com’s Double‑11 main event, newly listed products were written directly to Redis at launch, then asynchronously persisted to MySQL. End‑user queries read from Redis, while back‑office queries could still use the database, which comfortably handled the relatively lower traffic.

2. Multi‑Level Caching

Caching is a key weapon for high‑concurrency performance, and using it effectively across multiple layers requires careful design.

Redis is the first‑choice cache, capable of 60‑80k QPS on a single node. Under extreme load we can horizontally scale Redis, but this approach has drawbacks: Redis is single‑threaded, suffers from hotspot issues, and still experiences cache breakdown and penetration, especially in flash‑sale scenarios. Multi‑level caching becomes necessary: a fast, multi‑threaded MemCache layer sits in front of Redis to absorb hotspots, followed by a local in‑memory cache. This three‑tier flow (local → MemCache → Redis) can handle millions of QPS.

3. Multithreading

Early in my career, interviewers often asked about multithreading, and I was skeptical. A concrete example proved its value: an API originally iterated over a 300‑400k list, synchronously reading Redis (≈3 ms per read), resulting in >30 seconds and timeout. Replacing the loop with a thread‑pool, tuning thread count and queue size, reduced the response time to 3 seconds. Modern multi‑core services waste resources without multithreading, but thread pools must be monitored and sized correctly; improper settings can degrade performance. Multithreading should be applied judiciously, as excessive context switching may cause the opposite effect.

4. Degradation and Circuit Breaking

Both are self‑protection mechanisms similar to electrical fuses, preventing overload from crashing databases or Redis. Degradation disables non‑essential front‑end functions without affecting the main path, while circuit breaking stops calls to an overloaded downstream service, returning immediate failures. Choosing between them depends on business scenarios.

5. Optimizing I/O

I/O is often overlooked; frequent connection creation and teardown burden the system. In high‑concurrency requests, a single request can amplify I/O exponentially. For example, fetching detailed product info may require multiple downstream calls; during a flash sale, thousands of products trigger millions of downstream requests, saturating I/O and causing exponential response‑time growth. Batch downstream calls to collapse many requests into a single I/O operation.

6. Use Retries Cautiously

Retry is a common technique for transient failures, such as re‑sending a failed service request or database write. When using retries, keep in mind:

Control the number of retries.

Balance the interval between retries.

Make retry behavior configurable. In a past incident, excessive Kafka consumer retries caused severe lag and long delays; the lack of configurable retry count forced a code change to fix it. While retries can greatly improve success rates, the above points must be observed.

7. Guard Edge Cases and Provide Fallbacks

Even experienced engineers can overlook simple edge‑case checks, leading to major outages. In one review, a missing null‑check for an empty array caused an RPC to return full business data to millions of users. The fix was trivial but highlighted the importance of handling edge cases.

8. Log Elegantly

Logging is essential for troubleshooting, but in high‑traffic environments full‑volume logging becomes disastrous:

It consumes massive disk space; at 200k QPS, logs can generate several terabytes per second, totaling thousands of gigabytes per day.

Excessive logging adds I/O overhead, increasing response time. Implement rate‑limited logging (e.g., token‑bucket allowing one log per second) and whitelist‑based logging to restrict output to critical users, dramatically reducing unnecessary log volume.

Conclusion

This blog discussed fundamental considerations and mitigation strategies for high‑concurrency services under massive traffic. Real‑world systems are more complex, but these suggestions provide a starting point for building robust, scalable end‑user services. Keep a respectful attitude toward high concurrency, continue exploring, and strive for better internet applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Performance Optimization

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.