Common Nginx Misconfigurations That Cause Production Outages and How to Fix Them
The article systematically reviews ten typical Nginx configuration pitfalls that frequently trigger production incidents—such as location‑matching errors, proxy_pass slash issues, misuse of try_files, insufficient keepalive settings, client_max_body_size limits, gzip misconfiguration, incomplete TLS setup, worker process limits, log‑rotation problems, and exposed server version—providing a clear phenomenon → root cause → correct configuration → verification → risk reminder workflow for each, plus a comprehensive troubleshooting path, checklist, and rollback script for safe production changes.
Problem Background
Nginx is one of the most widely used reverse proxy and web server solutions in production environments. Mis‑configured Nginx is a common cause of incidents such as 502 Bad Gateway, 404 for static resources, 413 Request Entity Too Large, and TLS certificate errors.
Applicable Scenarios
Daily operations: onboarding new servers, pre‑change checks, incident triage.
Release changes: validation after each Nginx config modification.
Interview preparation: mastering core Nginx concepts.
Performance tuning: diagnosing slow responses or abnormal resource consumption.
General Troubleshooting Framework
Validate configuration syntax with nginx -t.
Inspect error_log (set to warn or error in production).
Check Nginx processes and listening ports ( ps aux | grep nginx, ss -tlnp | grep nginx).
Prefer nginx -s reload over a full restart to avoid connection drops.
Implement a Git‑based configuration‑change workflow with testing, verification, backup, and rollback.
Pitfall 1: Location Matching Priority Chaos
Phenomenon
Requests to /api/users return 404 or are handled by the generic / block, delivering HTML instead of JSON because a catch‑all location overrides the more specific /api block.
Root Cause
Nginx evaluates location directives by a strict priority order, not by declaration order. The hierarchy (high to low) is: location = /path – exact match. location ^~ /path – longest prefix, stop regex processing. location ~ /path or location ~* /path – regex (case‑sensitive or insensitive), evaluated in file order. location /path – ordinary prefix, longest‑match wins.
Incorrect Example
server {
listen 80;
location / {
root /usr/share/nginx/html;
index index.html;
}
location /api {
proxy_pass http://127.0.0.1:8080;
}
# The following catch‑all overrides the /api block because it lacks ^~
location ~* \.php$ {
proxy_pass http://127.0.0.1:9000;
}
}Correct Configuration
server {
listen 80;
# Exact match for the homepage
location = / {
root /usr/share/nginx/html;
index index.html;
}
# Prefix match for static assets – prevent regex takeover
location ^~ /static/ {
root /data/www;
expires 30d;
add_header Cache-Control "public, immutable";
}
# API proxy – ordinary prefix
location /api/ {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# Regex for static file types – placed after ordinary prefix
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2)$ {
root /data/www;
expires 7d;
access_log off;
}
# Default catch‑all
location / {
root /usr/share/nginx/html;
index index.html;
}
}Verification
Use curl -I to request /api/users and confirm a 200 response with JSON. Enable error_log at notice level and rewrite_log on to see the location matching trace.
Risk Reminder
Changing location priority can break existing API routes or static asset delivery. Perform a full regression test of all URL patterns and keep a backup of the previous configuration.
Pitfall 2: proxy_pass Trailing Slash Difference
Phenomenon
Two upstream definitions— proxy_pass http://127.0.0.1:8080; and proxy_pass http://127.0.0.1:8080/; —appear identical but produce completely different request URIs after proxying.
Root Cause
If the URI part is omitted, Nginx forwards the original request URI unchanged. Adding a trailing slash tells Nginx to replace the matched location prefix with the URI specified after the slash.
Examples
# No trailing slash – full path preserved
location /api {
proxy_pass http://127.0.0.1:8080;
# GET /api/users → upstream receives /api/users
}
# With trailing slash – prefix stripped
location /api {
proxy_pass http://127.0.0.1:8080/;
# GET /api/users → upstream receives /users
}Correct Configuration
Choose the form that matches business requirements and, if needed, combine with rewrite or proxy_redirect for fine‑grained path adjustments.
# Scenario 1 – backend expects the full original path
location /api {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# Scenario 2 – backend expects the path without the /api prefix
location /api/ {
proxy_pass http://127.0.0.1:8080/;
# /api/users → /users
}
# Complex mapping using rewrite
location /app/v1/ {
rewrite ^/app/v1/(.*) /$1 break;
proxy_pass http://127.0.0.1:8080;
}Verification
Log the request line on the upstream service (e.g., console.log(req.method, req.url) in Node.js or print(request.method, request.path) in Python) and compare with curl -v against the Nginx endpoint.
Risk Reminder
Changing the URI handling affects all routes that share the same location. Ensure downstream services can handle the new path or add explicit rewrites to preserve compatibility.
Pitfall 3: try_files Misuse Leads to Infinite Redirect Loops
Phenomenon
Accessing certain URLs triggers "Too many redirects" or a 500 Internal Server Error because try_files and rewrite interact in an unexpected way.
Root Cause
try_fileschecks a list of candidates sequentially; the last parameter is a fallback URI. When combined with a rewrite that redirects back to the original location, a loop is created.
Incorrect Example
location / {
root /data/www;
try_files $uri $uri/ /index.html;
}
location = /index.html {
root /data/www;
rewrite ^ / permanent; # redirects /index.html → / → loop
}Correct Configuration
Understand that the final argument of try_files is a URI, not a file path, and avoid rewriting to a location that again triggers try_files.
# Basic usage – file → directory → fallback
location / {
root /data/www;
index index.html index.htm;
try_files $uri $uri/ /fallback.html;
}
# PHP FastCGI example
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
# Named fallback for API back‑ends
location / {
try_files $uri @backend;
}
location @backend {
proxy_pass http://127.0.0.1:8080;
}Verification
Use curl -I on existing files, non‑existent paths, and the fallback URI to ensure the correct status codes (200, 404, 302) are returned. Check error.log for any "rewrite" loop warnings.
Risk Reminder
Looping configurations may not surface until a specific missing file is requested. Perform a bulk test with a script that requests a list of known‑missing paths and verifies that no 301/302 loops occur.
Pitfall 4: Insufficient Upstream keepalive Settings
Phenomenon
During traffic spikes, API latency spikes and 502 errors appear even though backend CPU/memory usage is low. netstat shows thousands of TIME_WAIT sockets.
Root Cause
Without keepalive, Nginx opens a new TCP connection for every request, causing massive TIME_WAIT accumulation and exhausting backend connection pools.
Incorrect Example
upstream backend {
server 127.0.0.1:8080;
# No keepalive – each request creates a new short‑lived connection
}
server {
location / {
proxy_pass http://backend;
# Missing HTTP/1.1 and Connection header adjustments
}
}Correct Configuration
upstream backend {
server 127.0.0.1:8080 weight=5 max_fails=3 fail_timeout=30s;
keepalive 32; # 32 idle connections per worker
keepalive_requests 1000; # Max requests per keepalive connection
keepalive_timeout 60s;
}
server {
listen 80;
proxy_http_version 1.1; # Required for keepalive
location / {
proxy_pass http://backend;
proxy_set_header Connection ""; # Remove Connection header to enable reuse
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
}Verification
Check upstream connection counts with ss -tn | grep :8080 | awk '{print $4}' | sort | uniq -c. Compare TIME_WAIT numbers before and after the change, and run a load test with ab or wrk to observe QPS and latency improvements.
Risk Reminder
Setting keepalive too high can consume excessive memory (each idle socket ≈ 2 KB). Keep the value around 10‑20 % of worker_connections and ensure the backend supports HTTP/1.1.
Pitfall 5: client_max_body_size Not Set or Too Small
Phenomenon
File uploads are rejected with 413 Request Entity Too Large even though the backend imposes no size limit.
Root Cause
Nginx enforces a default 1 MB request body limit. If the directive is missing or placed in the wrong context, the default applies.
Correct Configuration
http {
client_max_body_size 10m; # Global conservative default
server {
listen 80;
server_name example.com;
# Large uploads – 100 MB limit
location /upload/ {
client_max_body_size 100m;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 75s;
proxy_pass http://upload-backend;
}
# Regular API – keep default 1 M
location /api/ {
client_max_body_size 1m;
proxy_pass http://api-backend;
}
error_page 413 = /413.html;
location = /413.html {
root /data/www/errors;
internal;
}
}
}Verification
Generate test files with dd if=/dev/zero of=/tmp/test_2mb.bin bs=1M count=2 and upload via curl -F "file=@/tmp/test_2mb.bin" http://localhost/upload/. Expect success for 2 MB and a 413 for a 15 MB file when the limit is 10 M.
Risk Reminder
Allowing excessively large bodies can be abused for disk‑filling attacks. Pair size limits with backend validation, separate upload directories with quota controls, and optionally rate‑limit upload bandwidth.
Pitfall 6: Improper Gzip Compression Settings
Phenomenon
High bandwidth and CPU usage are observed, yet responses lack Content‑Encoding: gzip. Conversely, setting gzip_comp_level too high spikes CPU without noticeable size reduction.
Root Cause
Gzip is disabled by default; even when enabled, only text/html is compressed unless additional MIME types are added. Missing gzip_vary can also break CDN caching.
Correct Configuration
http {
gzip on;
gzip_comp_level 5; # Balance CPU vs. compression ratio
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_vary on;
gzip_min_length 1024;
gzip_types
text/plain
text/css
text/xml
text/javascript
application/json
application/javascript
application/xml
application/xml+rss
application/x-javascript
application/octet-stream
image/svg+xml;
server {
listen 80;
location /static/ {
expires 7d;
add_header Cache-Control "public, no-transform";
access_log off;
}
location /api/ {
proxy_pass http://backend;
}
}
}Verification
Issue curl -I -H "Accept-Encoding: gzip" http://localhost/api/data and confirm the Content‑Encoding: gzip header. Compare raw vs. compressed sizes with curl -s piped to wc -c. Run a short ab test with and without the Accept-Encoding: gzip header to see QPS and latency differences.
Risk Reminder
Compression levels above 6 yield diminishing returns while heavily taxing CPU.
Never gzip already compressed assets (PNG, JPEG, WebP, video, audio).
Small files (<1 KB) are not worth compressing.
Pitfall 7: Incomplete SSL/TLS Configuration
Phenomenon
Browsers display “Your connection is not private” and curl -v https://example.com shows an incomplete certificate chain or outdated TLS versions.
Root Cause
Common mistakes include missing intermediate certificates, mismatched private key, using deprecated protocols (SSLv3, TLS 1.0/1.1), weak cipher suites, or pointing to non‑existent certificate files.
Correct Configuration
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/nginx/ssl/example.com.fullchain.pem; # cert + intermediate
ssl_certificate_key /etc/nginx/ssl/example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
ssl_prefer_server_ciphers on;
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
location / {
root /data/www;
index index.html;
}
}
# HTTP → HTTPS redirect
server {
listen 80;
server_name example.com;
return 301 https://$server_name$request_uri;
}Verification
Run
openssl s_client -connect localhost:443 -servername example.comand verify the full certificate chain. Use curl -v https://localhost/ to ensure no TLS warnings. Test with testssl.sh for protocol and cipher coverage.
Risk Reminder
After certificate renewal, always re‑test the chain; an outdated intermediate will break trust.
Enabling HSTS with a long max-age before confirming a stable TLS setup can lock users into a broken configuration.
Older clients may not support TLS 1.3; keep TLS 1.2 as a fallback.
Pitfall 8: worker_processes & worker_connections Mismatch
Phenomenon
Even with worker_connections 65535, Nginx reports “too many connections” at modest traffic levels because the OS file‑descriptor limit is low.
Root Cause
Each worker can open up to worker_connections sockets, but the operating system’s ulimit -n (or fs.file-max) caps the total number of file descriptors. If the OS limit is 1024, Nginx cannot exceed that regardless of its internal settings.
Correct Configuration (Nginx)
# /etc/nginx/nginx.conf
worker_processes auto; # One per CPU core
worker_rlimit_nofile 65535; # Allow each worker to open many fds
events {
worker_connections 65535; # Must be <= OS limit
use epoll;
multi_accept on;
}
http {
open_file_cache max=65535 inactive=60s;
keepalive_timeout 65;
keepalive_requests 1000;
}System‑Level Adjustments
# Check current limit
ulimit -n
# Temporary increase for the current shell
ulimit -n 65535
# Permanent change – /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
root soft nofile 65535
root hard nofile 65535
# sysctl for global file‑descriptor pool
echo "fs.file-max = 1000000" >> /etc/sysctl.conf
sysctl -p
# systemd service override (if using systemd)
# /lib/systemd/system/nginx.service – add:
# LimitNOFILE=65535
systemctl daemon-reload && systemctl restart nginxVerification
# View socket statistics
ss -s
# Run a load test to confirm no "too many connections" errors
ab -n 10000 -c 5000 http://localhost/api/Risk Reminder
Each open descriptor consumes kernel memory (~2 KB). 65 535 descriptors ≈ 130 MB.
Setting somaxconn too high can amplify SYN‑Flood attacks.
Always test after changing limits; a mis‑configured limit can prevent Nginx from starting.
Pitfall 9: Log Configuration Causing Disk Exhaustion
Phenomenon
Disk fills up unexpectedly; df -h shows the root partition at 100 %. Nginx access.log and error.log have grown to dozens of gigabytes because logrotate is not rotating or the log level is too verbose.
Root Cause
High‑traffic environments generate massive logs. Issues include missing or too‑infrequent logrotate, overly detailed log_format, debug‑level error_log, and logs written to the root filesystem instead of a dedicated log partition.
Correct Nginx Log Settings
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'rt=$request_time uct="$upstream_connect_time" '
'uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/access.log main buffer=16k flush=2m;
error_log /var/log/nginx/error.log warn; # Avoid debug in prod
}
server {
server_name example.com;
access_log /var/log/nginx/example.com.access.log main;
error_log /var/log/nginx/example.com.error.log;
location /health { access_log off; return 200 "OK"; }
location /static/ { access_log off; expires 7d; }
}logrotate Configuration
# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 nginx nginx
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 $(cat /var/run/nginx.pid)
fi
endscript
}Emergency Disk‑Space Recovery
# Find largest log files
du -sh /var/log/nginx/* | sort -rh | head -10
# Truncate without deleting the inode
truncate -s 0 /var/log/nginx/access.log
kill -USR1 $(cat /var/run/nginx.pid) # Reopen log filesVerification
# Simulate rotation
logrotate -d /etc/logrotate.d/nginx
# Check log sizes
ls -lh /var/log/nginx/
# Monitor disk usage
watch -n 5 "df -h /var/log"
# Ensure error.log does not contain excessive debug entriesRisk Reminder
Never delete a log file that Nginx still holds open; use truncate or signal -USR1 instead.
Improper postrotate signals (e.g., -HUP) can cause full worker restarts.
Turning off access_log for static assets saves I/O but removes valuable traffic data; keep it for API endpoints.
Pitfall 10: Server Header Version Leak Not Effective
Phenomenon
Requests return Server: nginx/1.18.0 despite server_tokens off; being set, exposing version information to attackers.
Root Cause
server_tokens offonly hides the version in error pages and some headers. If other server blocks override the setting, or third‑party modules inject their own headers, the version may still appear.
Correct Configuration
http {
server_tokens off; # Hide version globally
# Optional: completely replace the Server header (requires ngx_http_headers_more module)
# more_set_headers 'Server: MyServer';
server {
listen 80;
server_name example.com;
error_page 404 /404.html;
error_page 500 502 503 504 /50x.html;
location / { root /data/www; }
location = /50x.html { root /data/www/errors; }
}
}Verification
# Verify Server header
curl -I http://localhost/ | grep -i server
# Test error pages
curl -I http://localhost/nonexistent_path | grep -i serverRisk Reminder
Completely removing the Server header provides limited security benefit; attackers can still fingerprint Nginx via TLS characteristics or response behavior.
Third‑party modules must be trusted; they can introduce new vulnerabilities.
Some security scanners rely on the Server header for asset inventory; updating documentation accordingly avoids false positives.
Comprehensive 502/504 Troubleshooting Flow
502/504
├── 1. Check upstream health (curl health endpoint, ps, ss)
├── 2. Verify network connectivity (telnet, ping, iptables/SELinux)
├── 3. Review upstream connection limits and timeouts (keepalive, proxy_*_timeout)
├── 4. Inspect upstream process/container status (OOM, Docker restart, Kubernetes pod state)
├── 5. Search error_log for specific messages (connect() failed, connection refused, timeout, no live upstreams)
└── 6. Validate upstream configuration (correct server IP/port, proper proxy_pass, all servers not down)Common Diagnostic Commands
# Quick 502 diagnosis (last 100 lines)
tail -100 /var/log/nginx/error.log | grep -i "502\|upstream\|connect"
# Upstream process status
ps aux | grep -E "java|node|python|php" | grep -v grep
# Listening ports
ss -tlnp | grep -E "8080|3000|5000|9000|3306"
# Local health check
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8080/health
# System resources
free -h && df -h / && uptime
# Recent OOM or kill messages
dmesg | tail -50 | grep -iE "oom|killed|nginx|java|node"Production Change Checklist
Pre‑Change
# Backup current config
cp -r /etc/nginx /etc/nginx.bak.$(date +%Y%m%d%H%M%S)
# Syntax check in test env
nginx -t -c /etc/nginx/nginx.conf
# Identify impacted servers/paths
grep -n "proxy_pass\|upstream\|listen" /etc/nginx/conf.d/*.conf
# Verify IPs, ports, and paths are correctDuring Change
# Deploy new config
cp /path/to/new/nginx.conf /etc/nginx/nginx.conf
# Verify syntax again
nginx -t
# Reload gracefully
nginx -s reload
# Confirm master & workers are running
ps aux | grep nginx
# Tail error log for immediate issues
tail -f /var/log/nginx/error.logPost‑Change
# Full health check (all critical endpoints)
curl -s -o /dev/null -w "%{http_code} %{url_effective}
" \
http://localhost/ \
http://localhost/api/ \
http://localhost/static/
# Monitor 502/504 rate for 5 minutes
for i in {1..5}; do
echo "=== Check $i ==="
grep -c "502\|504" /var/log/nginx/access.log
sleep 60
done
# Verify worker count matches configuration
ps aux | grep "nginx: worker" | wc -l
# If problems arise, rollback immediately
cp /tmp/nginx.conf.backup /etc/nginx/nginx.conf
nginx -t && nginx -s reloadRollback Script
#!/bin/bash
BACKUP_FILE="/tmp/nginx.conf.backup"
NGINX_CONF="/etc/nginx/nginx.conf"
if [ ! -f "$BACKUP_FILE" ]; then
echo "ERROR: Backup file not found: $BACKUP_FILE"
exit 1
fi
echo "Rolling back Nginx configuration..."
cp "$BACKUP_FILE" "$NGINX_CONF"
if nginx -t 2>&1 | grep -q "syntax is ok"; then
nginx -s reload
echo "Rollback successful. Nginx reloaded."
else
echo "ERROR: Rollback config failed syntax check. Manual intervention required."
exit 1
fiSummary
The ten Nginx pitfalls share a common theme: configuration items are interdependent. Subtle effects such as a trailing slash in proxy_pass, location‑matching rules, or the interaction between worker_connections and OS limits can cascade into 502/504 errors, performance degradation, or security exposure.
Key takeaways for operations engineers:
Always test before deployment.
Make incremental changes and keep reliable rollbacks.
Align Nginx limits with system resources.
Treat logs as the primary observability source.
Apply complete TLS hardening and hide version information where required.
By internalizing these principles and the detailed examples above, engineers can avoid common misconfigurations, maintain high‑performance, secure Nginx deployments, and reduce production incidents.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
