How I Boosted a Python Service to 50k QPS: Real‑World Performance Tuning
This article documents a step‑by‑step performance optimization of a Python web module, covering requirement analysis, environment setup, load‑testing results, database and TCP bottleneck identification, caching strategies, kernel tuning, and the final achievement of 50,000 QPS with low latency.
Preface
This article records a Python program performance optimization, the problems encountered, and how they were solved, offering a practical optimization mindset while acknowledging that the approach is not the only possible solution.
How to Optimize
Optimization must be goal‑driven; vague claims of "million concurrent users" are meaningless without clear objectives. Before optimizing, define target metrics and identify the performance bottleneck rather than making random changes.
Requirement Description
The project was a standalone module split from the main site due to high concurrency. The split required a stress‑test QPS of at least 30,000, database load under 50%, server load under 70%, request latency under 70 ms, and error rate below 5%.
Environment configuration:
Server: 4‑core 8 GB RAM, CentOS 7, SSD Database: MySQL 5.7, max connections 800 Cache: Redis 1 GB
All services were purchased from Tencent Cloud.
Load‑testing tool: Locust with Tencent auto‑scaling for distributed testing.
Requirement details:
When a user visits the homepage, the system queries the database for suitable popup configurations. If none exist, it waits for the next request; otherwise it returns the configuration to the frontend. Various branches handle user clicks, timing, and subsequent returns.
Key Analysis
The three critical points are: 1) locating appropriate popup configurations for users, 2) recording the next return time in the database, and 3) logging user actions on the returned configuration.
Tuning
All three points involve database operations, both reads and writes. Without caching, every request hits the database, exhausting connections and causing slow SQL execution. The first step was to separate write operations and improve DB connection handling. The initial architecture is shown below:
Write operations were moved to a FIFO message queue implemented with a Redis list.
Initial load test results: QPS around 6,000, 502 errors rose to 30%, CPU fluctuated between 60‑70%, database connections saturated (~6,000 TCP connections), indicating a DB bottleneck due to repeated reads for user configurations.
After loading all configurations into Redis cache (reading from DB only on cache miss), a second test showed QPS up to ~20,000, CPU 60‑80%, DB connections around 300, and TCP connections about 15,000 per second.
Despite sufficient socket limits (ulimit -n reported 65,535, later increased to 100,001), QPS plateaued around 22,000. Investigation revealed that TCP connections remained in TIME_WAIT after the four‑way handshake, preventing immediate reuse.
TCP connections stay in TIME_WAIT after termination to prevent stray packets from being misinterpreted.
Since Linux does not expose a direct kernel parameter to shorten TIME_WAIT, the focus shifted to adjusting related settings:
#timewait count, default 180000
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.ip_local_port_range = 1024 65000
# enable fast recycle
net.ipv4.tcp_tw_recycle = 1
# enable reuse of TIME-WAIT sockets
net.ipv4.tcp_tw_reuse = 1After applying these kernel tweaks, the final load test achieved QPS of 50,000, CPU at 70%, normal database and TCP connections, average response time 60 ms, and 0% error rate.
Conclusion
The entire development, tuning, and testing cycle highlighted that web development is a multidisciplinary engineering practice involving networking, databases, programming languages, and operating systems, requiring a solid foundational knowledge base. Enabling tcp_tw_recycle and tcp_tw_reuse involves trade‑offs, as they can introduce other issues, but they were accepted here to achieve the performance goals.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
