Backend Development 6 min read

Nginx Performance Tuning: Resolving Timeout Issues with Keepalive, TCP Settings, and Upstream Timeouts

This article details a real‑world Nginx performance tuning case where long‑running client requests caused timeouts, describing how adjusting keepalive settings, TCP reuse parameters, and upstream proxy timeouts eliminated the delays and stabilized the service for millions of calls.

Qunar Tech Salon

Sep 2, 2015

Nginx Performance Tuning: Resolving Timeout Issues with Keepalive, TCP Settings, and Upstream Timeouts

Nginx is a widely used web server and reverse proxy, but optimal configuration varies by business scenario; after traffic exceeded one million calls, the team encountered intermittent request timeouts despite low latency on the monitoring side.

Initial investigation showed that some clients were not using persistent connections, while the server kept keepalive connections open for five minutes ( keepalive_timeout 300; and keepalive_requests 1000;), leading to many idle sockets in TIME_WAIT state.

Netstat confirmed fewer new connections after the keepalive change, yet logs still recorded request times over 1 s and occasional spikes up to 20 s. Zabbix monitoring revealed that spikes in connection writing and connection active correlated with the timeouts.

Further analysis of the request path client → Nginx → Tomcat → App indicated the bottleneck lay between Nginx and Tomcat. The team observed that a single upstream machine had many long‑duration calls while its Java process had actually crashed, causing Nginx to keep routing traffic to a dead backend.

To mitigate the excess TIME_WAIT sockets, the sysctl parameters were tuned: net.ipv4.tcp_tw_reuse = 1 # enable reuse of TIME‑WAIT sockets net.ipv4.tcp_tw_recycle = 1 # enable fast recycling of TIME‑WAIT sockets

After applying these kernel settings, the number of TIME_WAIT connections dropped dramatically, and Zabbix showed a clear reduction in connection‑writing metrics, though occasional timeouts persisted.

To better isolate upstream latency, the Nginx log format was extended to record $upstream_response_time and $upstream_addr:

# $upstream_response_time – backend response time
# $upstream_addr – backend IP and port
log_format main '$remote_addr - [$time_local] "$request" '
                '$status $body_bytes_sent '
                '"$request_time" "$upstream_response_time" "$upstream_addr" "$request_body"';

Analysis of the enriched logs showed that the longest calls consistently originated from the same backend machine, which was later found to be down.

Since the default upstream timeout in Nginx is 60 s, the team reduced it to 1 s to match the application's SLA, adding the following directives to nginx.conf:

# timeout settings
proxy_connect_timeout 1s;
proxy_send_timeout 1s;
proxy_read_timeout 1s;

After deployment, most timeout incidents disappeared, though occasional request times above 1 s persisted, likely due to slow client reception; the final verification will require the client to enable persistent connections.

Overall, the combination of keepalive tuning, TCP reuse/recycle, detailed upstream logging, and stricter proxy timeouts resolved the majority of the Nginx timeout problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend performance tuning TCP Nginx Timeout Keepalive

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.