Operations 13 min read

How to Diagnose and Fix Nginx 504/499 Timeouts in Cross‑Continental Deployments

This article walks through a real‑world case of a private‑cloud service experiencing 504 and 499 errors due to cross‑continent latency, detailing step‑by‑step troubleshooting, log analysis, Nginx timeout parameter tuning, SLB configuration changes, and key concepts behind Nginx error codes and proxy settings.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Diagnose and Fix Nginx 504/499 Timeouts in Cross‑Continental Deployments

1. Background

The client runs a private deployment accessed from Australia to a SaaS service hosted in Europe, resulting in high network latency and frequent 504 timeout errors. The investigation aims to identify the root cause and apply temporary configuration fixes.

2. Investigation Steps and Thought Process

2.1 Fault Communication

Collect essential information from the customer: the page being accessed, the failing API endpoint, approximate error time, request ID (x‑request‑id), and the full URL.

Typical checklist:

Which page was opened?

Which API returned the error?

When did the error occur?

Is there an x‑request‑id?

What is the exact URL?

2.2 Mapping the Request Chain

The full request path is:

Client browser → Private CDN (1) → Private SLB (2) → Private Nginx (3) → SaaS CDN (4) → SaaS SLB (5) → SaaS Nginx (6) → SaaS backend service

Each environment may differ, so the chain must be adapted to the specific architecture.

2.3 Log Inspection

First round of investigation

Check the private Nginx (3) logs using the x‑request‑id. The logs show a 504 status after a 30‑second request time, matching the client‑side timeout.

The proxy_read_timeout was set to 30 s, causing Nginx to return 504 when the backend did not respond within that window. The timeout was increased to 300 s and Nginx reloaded.

Second round of investigation

After the change, the error code switched to 499 with a request time of 60 s. This indicated that Nginx’s default proxy_connect_timeout of 60 s was expiring before the client’s 30‑second timeout, leading to many 499 entries.

To keep Nginx waiting for the backend even if the client aborts, the proxy_ignore_client_abort directive was set to on and Nginx reloaded.

Later logs showed a 504 with a 180‑second timeout, suggesting the issue moved to the SaaS side SLB (5), whose default maximum connection timeout is 180 s.

Since the SLB timeout could not be increased for HTTP/HTTPS, a new TCP listener (supporting up to 900 s) was created, set to 350 s, and the upstream address was switched to this listener. After the change, the client reported normal access.

Third round of investigation

Even after the TCP listener adjustment, occasional 504s persisted. Private Nginx logs showed only 200 responses with long processing times (>60 s). SaaS Nginx logs were also normal. Manual requests from the server and packet captures (tcpdump) showed successful TCP handshakes and HTTP responses.

The final clue was that the server name in the response was not Nginx (renamed to “Sws”), indicating that the timeout originated from the private SLB (2). The SLB’s listener connection timeout was originally 60 s; after raising it to 180 s, no further timeouts were observed for two days.

3. Knowledge Points Gained During Troubleshooting

3.1 Nginx 499 Status Code

Source code reference #define NGX_HTTP_CLIENT_CLOSED_REQUEST 499 When a client closes the connection prematurely, Nginx records a 499 status because standard HTTP codes lack a representation for this situation.

Resolution strategies:

Investigate why the backend is slow and consider optimizing it or extending client‑side timeouts.

Enable proxy_ignore_client_abort on so Nginx continues to wait for the backend response even if the client aborts.

location /api {
    proxy_ignore_client_abort on;
    proxy_pass http://service_backends;
}

3.2 Nginx Timing Variables

request_time : total time from the first byte received from the client to the last byte sent to the client.

upstream_response_time (or up_resp_time): time spent communicating with the upstream server, from connection start to close.

upstream_addr : address of the upstream server.

Typically, request_timeupstream_response_time.

3.3 Important Nginx Proxy Parameters

proxy_connect_timeout : timeout for establishing a connection to the upstream server (default 60 s).

proxy_read_timeout : how long Nginx waits for a response from the upstream (default 60 s).

proxy_send_timeout : timeout for sending the request to the upstream (default 60 s).

4. Summary

The configuration changes described are non‑standard; each incident requires a tailored approach based on its specific context.

The root cause was cross‑continent latency without a dedicated line, leading to timeout issues that were mitigated by adjusting Nginx and SLB timeout parameters.

Do not attribute every failure solely to network problems; thorough analysis and targeted configuration tuning are essential for reliable operations.

proxyoperationsTroubleshootingNginxTimeout499504
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.