Why Nginx Caches DNS for Weeks and How to Fix It
In production, Nginx cached DNS lookups for up to a month, causing requests to stale IPs after CDN changes; this article explains the root cause, demonstrates how to configure upstream health checks and the built‑in resolver to ensure timely DNS updates and avoid prolonged outages.
Problem Recap: Real Production Incident
We proxied requests to a third‑party domain (e.g., api.partner.com) whose DNS resolved to multiple IPs (IP1, IP2, IP3, …).
The partner silently removed one IP (IP3) from DNS.
Our Nginx continued sending traffic to the now‑dead IP3, causing massive transaction failures.
The issue persisted for two weeks until we manually reloaded Nginx.
Root cause: Nginx caches the DNS resolution result after the first lookup and never refreshes it unless the process is restarted.
By default, Nginx resolves a domain once at startup or reload and never updates the cached IPs thereafter.
Solution 1: Use upstream with Health Checks (Recommended)
If you control the backend address, define an upstream block with a failure detection mechanism:
upstream backends {
least_conn;
server api.example.com:8080 max_fails=3 fail_timeout=10s;
}
server {
location / {
proxy_pass http://backends;
}
}Effect:
10 seconds of failures → automatically mark the IP as "unavailable"
No more requests to the faulty IP
When the IP recovers, it is automatically added backNote: Adjust max_fails and fail_timeout according to your business needs.
Solution 2: Enable Nginx Built‑in DNS Resolver (Critical)
When the backend must be addressed by a domain whose IP changes (CDN, cloud services), configure a resolver and use a variable for the upstream host:
server {
listen 8080;
server_name localhost;
# Specify DNS servers (public DNS recommended)
resolver 114.114.114.114 223.5.5.5 valid=600s;
resolver_timeout 3s;
# Important: assign the domain to a variable
set $backend "api.example.com";
location / {
proxy_pass http://$backend; # Do NOT use the domain directly
}
}Key parameters:
resolver : list of DNS servers (space‑separated).
valid=600s : DNS cache TTL (recommended 10 minutes).
resolver_timeout : DNS query timeout (recommended 3 seconds).
set $backend "..." : the domain must be stored in a variable for proxy_pass.
Important reminder: If any DNS server in resolver fails, the lookup will wait until resolver_timeout before failing, potentially causing a 502 error. Ensure all DNS addresses are reachable.
Verification steps:
nginx -t && nginx -s reload
# Simulate backend IP change (e.g., modify /etc/hosts)
# Observe Nginx switches to the new IP after the <code>valid</code> periodBest‑practice recommendations:
If backend IPs are fixed, use upstream with max_fails.
If backend is a dynamic domain (CDN/cloud), configure resolver + variable.
For high availability, configure multiple reliable DNS servers (e.g., 114.114.114.114 and Alibaba DNS).
For high security, avoid open DNS (0.0.0.0/0) and restrict DNS sources.
Summary: Nginx caches DNS results permanently by default; when proxying dynamic domains, you must configure a resolver with an appropriate valid time and use a variable for proxy_pass to ensure the IP is refreshed automatically.
Xiao Liu Lab
An operations lab passionate about server tinkering 🔬 Sharing automation scripts, high-availability architecture, alert optimization, and incident reviews. Using technology to reduce overtime and experience to avoid major pitfalls. Follow me for easier, more reliable operations!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
