Backend Development 7 min read

Performance Optimization of a High-Concurrency Python Service: Architecture, Tuning, and Load Testing

This article documents a Python service performance optimization case, detailing the initial requirements, environment setup, bottleneck analysis, architectural redesign with caching and message queues, Linux TCP time-wait tuning, and load-testing results that achieved 50 k QPS with low latency and error rate.

Top Architect
Top Architect
Top Architect
Performance Optimization of a High-Concurrency Python Service: Architecture, Tuning, and Load Testing

Introduction: The author, a senior architect, shares a case study of optimizing a Python program's performance, emphasizing that optimization must be driven by concrete requirements.

Requirements: The module must handle QPS ≥30,000, database load ≤50%, server load ≤70%, request latency ≤70 ms, error rate ≤5%.

Environment: 4‑core 8 GB CentOS 7 server, MySQL 5.7 (max connections 800), Redis 1 GB, Locust for distributed load testing.

Problem analysis: Initial load test showed QPS ~6 k with 30% 502 errors, CPU 60‑70%, DB connections saturated (~6 k). The bottleneck was frequent database reads for popup configuration and write‑through operations.

First optimization: Decoupled write operations using a Redis list as a FIFO message queue, but QPS only rose to ~20 k, still limited by TCP connections and time‑wait sockets.

Further investigation revealed that many TCP connections remained in TIME‑WAIT, exhausting the 800 DB connections. Adjusting Linux kernel parameters (net.ipv4.tcp_max_tw_buckets, net.ipv4.tcp_tw_recycle, net.ipv4.tcp_tw_reuse) reduced TIME‑WAIT count.

# timewait count, default 180000
net.ipv4.tcp_max_tw_buckets = 6000

# enable fast recycle
net.ipv4.tcp_tw_recycle = 1

# enable reuse of TIME‑WAIT sockets
net.ipv4.tcp_tw_reuse = 1

After applying these settings and caching all popup configurations in Redis, subsequent load tests achieved QPS up to 50 k, CPU around 70%, average latency 60 ms, and zero error rate.

Conclusion: The case demonstrates that effective performance tuning requires a holistic view of application code, database access patterns, caching strategies, and operating‑system network parameters.

performance optimizationPythonbackend developmentload testingLinux Tuning
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.