How I Boosted Web Service to 50k QPS: Real‑World Performance Tuning Guide
This article documents a step‑by‑step performance optimization of a high‑traffic web module, covering requirement analysis, bottleneck identification, caching, message‑queue offloading, Linux TCP tuning, and the resulting capacity increase to 50,000 QPS with sub‑70 ms latency.
Preface
This article records a program performance optimization case, the problems encountered, and how they were solved, offering a practical optimization mindset while noting that the approach is not the only possible solution.
How to Optimize
Optimization must be driven by real requirements; claiming millions of concurrent users without context is meaningless. Set clear performance goals, identify bottlenecks, and avoid random tweaks.
Requirements Description
The module was split from the main site due to high concurrency. Targets: QPS ≥ 30,000, database usage ≤ 50%, server load ≤ 70%, request latency ≤ 70 ms, error rate ≤ 5%.
Environment: 4‑core, 8 GB RAM CentOS 7 server with SSD, MySQL 5.7 (max connections 800), Redis 1 GB, Tencent Cloud services, and Locust for distributed load testing.
Workflow: Users request the homepage, the system checks the database for a suitable popup configuration, returns it if found, otherwise waits for the next request. Various branches handle clicks, timing, and configuration updates.
Key Analysis
Three critical points emerge: 1) selecting appropriate popup configurations, 2) recording the next return time in the database, and 3) logging user actions on the returned configuration.
Tuning
All three points involve database reads and writes, which would saturate connections without caching. The first step was to offload write operations to a FIFO queue using Redis list.
Initial load test showed QPS around 6,000 with 30% 502 errors, CPU 60‑70%, and database connections maxed out, confirming the database as the bottleneck.
After redesigning the architecture to cache all configurations and only query the database on cache miss, the system reached about 20,000 QPS, CPU 60‑80%, and ~300 database connections.
Further investigation revealed that TCP connections were not being fully utilized due to sockets lingering in TIME‑WAIT state.
After the four‑way handshake, TCP connections stay in TIME‑WAIT for a period to ensure all data is received.
Linux does not expose a direct kernel parameter to shorten TIME‑WAIT, but the net.ipv4.tcp_max_tw_buckets setting (default 180,000) can be reduced, and fast recycling and reuse can be enabled.
# timewait count, default 180000
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.ip_local_port_range = 1024 65000
# enable fast recycle
net.ipv4.tcp_tw_recycle = 1
# enable reuse of TIME‑WAIT sockets
net.ipv4.tcp_tw_reuse = 1After applying these kernel tweaks and re‑testing, the service achieved 50,000 QPS, 70% CPU, normal database and TCP connections, average response time 60 ms, and 0% error rate.
Conclusion
The entire development, tuning, and load testing process highlighted that web development is an interdisciplinary engineering practice involving networks, databases, programming languages, and operating systems, requiring solid foundational knowledge to diagnose and resolve performance issues effectively.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
