Backend Development 11 min read

How to Build a High‑Performance Java Service for 200k+ QPS

This article explains how to design and optimize a Java‑based online service that handles over 200,000 queries per second by avoiding relational databases, applying multi‑level caching, leveraging multithreading, using degradation and circuit‑breaker patterns, reducing I/O, managing retries, handling edge cases, and logging efficiently.

Java High-Performance Architecture

Jun 9, 2022

How to Build a High‑Performance Java Service for 200k+ QPS

Preface

How can we optimize a high‑concurrency service with QPS above 200,000, and what challenges do online services face? Real‑time data cannot be cached offline; all reads are live. Massive request volume demands response times under 300 ms, otherwise user experience degrades sharply. Data volume is huge – e.g., 500 k QPS at 1 KB per request equals 5 GB per second, putting extreme pressure on storage and access layers.

1. Say No to Relational Databases

Large‑scale consumer‑facing services should not use relational databases as the primary store. Even with sharding or connection‑pool optimizations, MySQL/Oracle cannot sustain QPS > 500 k. Instead, use NoSQL caches such as Redis or MemCache as the main "database" and keep relational databases only for asynchronous backup and query support.

Example: during JD’s Double‑11 event, product data is written directly to Redis when the sale starts, then asynchronously persisted to MySQL. Front‑end (C‑end) queries read from Redis, while back‑office (B‑end) queries can still use the database.

2. Multi‑Level Caching

Redis is the first‑choice cache, capable of 60‑80 k QPS per node. Horizontal scaling can handle growing traffic, but Redis is single‑threaded and suffers from hotspot, cache‑penetration, and cache‑breakdown issues, especially in flash‑sale scenarios.

Introduce a second cache layer (e.g., MemCache) that is multi‑threaded and better at handling hotspots, followed by a local in‑memory cache. The request flow becomes: local cache → MemCache → Redis, allowing the system to absorb millions of QPS.

3. Multithreading

Switching from a synchronous loop that reads Redis (≈3 ms per call) over a 300‑400 k list to a thread‑pool implementation reduced the endpoint response time from > 30 s to about 3 s, demonstrating the power of parallelism.

However, thread pools must be tuned (core size, queue length) and monitored; improper settings can degrade performance or cause resource exhaustion.

4. Degradation and Circuit Breaker

Degradation disables non‑essential upstream functions to protect the core service, while a circuit breaker stops forwarding requests to an overloaded downstream service, returning immediate failures instead of exhausting resources.

Choosing between them depends on business impact and failure isolation requirements.

5. Optimize I/O

Frequent connection creation and teardown adds heavy I/O load. Batch remote calls to reduce the number of I/O operations, especially when a single user request triggers many downstream calls, can dramatically improve throughput.

6. Use Retries Wisely

Retries should be limited in count, spaced with appropriate intervals, and configurable. Over‑retrying can cause cascading failures, as seen in a Kafka consumer lag incident caused by excessive retry attempts.

7. Boundary Cases and Fallbacks

Neglecting edge‑case checks (e.g., empty arrays) can lead to massive data leaks or service crashes. Proper validation and fallback logic are essential for robust production systems.

8. Graceful Logging

Full‑volume logging at 200 k QPS can consume terabytes of disk space per day and increase I/O latency. Apply rate‑limiting (e.g., token‑bucket) or whitelist‑based logging to keep only valuable logs.

In summary, the article outlines practical strategies for handling massive traffic in high‑concurrency Java services, emphasizing cache hierarchy, multithreading, protective patterns, I/O reduction, controlled retries, edge‑case handling, and efficient logging.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java performance optimization caching high concurrency

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.