How I Boosted a Python Service to 50k QPS: Real‑World Performance Tuning Steps

This article details a step‑by‑step performance optimization of a Python backend service, covering requirement analysis, architecture redesign with caching and Redis queues, load‑testing results, TCP TIME_WAIT issues, and kernel parameter tweaks that ultimately raised throughput to 50,000 QPS with zero errors.

Efficient Ops
Efficient Ops
Efficient Ops
How I Boosted a Python Service to 50k QPS: Real‑World Performance Tuning Steps

Preface

This article records a Python program performance optimization, the problems encountered, and how they were solved, offering a practical optimization mindset while noting that the approach is not the only possible solution.

How to Optimize

Optimization must be goal‑driven; blind concurrency numbers are meaningless. Define clear performance targets, locate bottlenecks, and avoid random tinkering.

Requirement Description

The module was split from the main site due to high concurrency. The split must meet: QPS ≥ 30,000, database load ≤ 50%, server load ≤ 70%, request latency ≤ 70 ms, error rate ≤ 5%.

Environment configuration:

Server: 4‑core 8 GB RAM, CentOS 7, SSD Database: MySQL 5.7, max connections 800 Cache: Redis, 1 GB capacity.

All services are purchased from Tencent Cloud.

Load‑testing tool: Locust, using Tencent auto‑scaling for distributed testing.

Requirement details: The homepage checks the database for suitable popup configurations. If found, it returns them; otherwise it waits for the next request. User actions (click, ignore) affect whether the configuration is returned again, with timing logic for re‑display.

Key Analysis

Three critical points: 1) Find appropriate popup configuration for the user; 2) Record the next return time in the database; 3) Record user actions on the returned configuration.

Tuning

All three points involve database reads and writes. Without caching, every request hits the database, exhausting connections and causing slow SQL execution. The first step is to separate write operations and improve DB connection handling. The initial architecture is shown below:

Write operations are placed into a FIFO message queue implemented with a Redis list.

Initial load test results: QPS around 6,000, 502 errors rose to 30%, CPU 60‑70%, database connections saturated (≈6,000 TCP connections), indicating a database bottleneck due to repeated reads for configuration lookup.

After loading all configurations into Redis cache and only querying the database on cache miss, a second load test showed QPS up to ~20,000, CPU 60‑80%, database connections around 300, and TCP connections about 15,000 per second.

Although QPS increased, the TCP connection count did not reach 20,000, suggesting a limit on socket reuse. Checking ulimit -n showed 65,535, and increasing it to 100,001 gave only slight improvement (QPS ≈22,000).

The deeper issue was that TCP connections remained in TIME_WAIT after closure, preventing immediate reuse. Linux does not expose a direct parameter to shorten TIME_WAIT, but the following kernel settings can mitigate the problem:

#timewait count, default 180000
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.ip_local_port_range = 1024 65000
# enable fast recycle
net.ipv4.tcp_tw_recycle = 1
# enable reuse of TIME_WAIT sockets
net.ipv4.tcp_tw_reuse = 1

After applying these settings and re‑testing, the service achieved QPS of 50,000, CPU around 70%, normal database and TCP connections, average response time 60 ms, and 0% error rate.

Conclusion

The development, tuning, and load‑testing process highlighted that web development is a multidisciplinary engineering practice involving networking, databases, programming languages, and operating systems. Solid fundamentals are essential for effective analysis and optimization.

TCP connections, after the four‑way handshake termination, stay in TIME_WAIT for a period to prevent stray packets from being misinterpreted by future connections.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationPythonTCPLinuxLoad Testing
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.