Operations 25 min read

How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist

This article provides a comprehensive, step‑by‑step checklist for diagnosing and resolving Nginx 502 Bad Gateway errors, covering backend service verification, configuration checks, log analysis, resource monitoring, network troubleshooting, special scenarios, and long‑term preventive measures.

Ops Community
Ops Community
Ops Community
How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist

Introduction

"502 Bad Gateway" is one of the most dreaded error pages for web operators, users, and managers alike. In Nginx, a 502 indicates that it could not obtain a valid response from an upstream server. After handling hundreds of incidents, the author presents a systematic checklist that resolves over 90% of cases.

Technical background: Understanding the nature of Nginx 502

HTTP status code 502 meaning

When a server acting as a gateway or proxy attempts to fulfill a request, it receives an invalid response from the upstream server.

In a typical Nginx setup the architecture is:

client → Nginx (reverse proxy) → backend application (PHP‑FPM/Tomcat/Node.js etc)

When Nginx returns 502, it means:

Nginx itself is running normally.

Nginx receives the client request.

Nginx cannot communicate with the backend or receives an abnormal response.

Difference between 502 and other errors

502 Bad Gateway : Nginx fails to communicate with the backend.

503 Service Unavailable : Nginx believes the backend is unavailable.

504 Gateway Timeout : Communication with the backend timed out.

500 Internal Server Error : The error originates from the backend, not Nginx.

Nginx and backend communication mechanisms

Nginx talks to backends via different protocols:

FastCGI (PHP‑FPM):

location ~ \.php$ {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
}

HTTP (Java/Node.js):

location /api {
    proxy_pass http://127.0.0.1:8080;
    proxy_set_header Host $host;
}

uwsgi (Python):

location / {
    uwsgi_pass 127.0.0.1:8001;
    include uwsgi_params;
}

Understanding these mechanisms is the basis for troubleshooting 502.

Core content: Detailed 502 troubleshooting checklist

Step 1: Verify backend services are running

This is the most fundamental step—many 502s are caused by backend services being down.

Check backend process status

# Check PHP‑FPM
ps aux | grep php-fpm
systemctl status php-fpm

# Check Tomcat/Java
ps aux | grep java
systemctl status tomcat

# Check Node.js
ps aux | grep node
pm2 status  # if using pm2

# Check Python uwsgi
ps aux | grep uwsgi
systemctl status uwsgi

Judgment criteria:

Process does not exist → service is down, needs to be started.

Process exists but is in an abnormal state (e.g., D state) → possible deadlock or resource wait.

Check backend service port listening

# Check if port is listening
netstat -tlnp | grep 9000   # PHP‑FPM default port
netstat -tlnp | grep 8080   # Common Java port

# Or use ss (faster)
ss -tlnp | grep 9000

# Test connection
telnet 127.0.0.1 9000
nc -zv 127.0.0.1 9000

Common issues:

Port not listening → service not started or misconfigured.

Listening on 127.0.0.1 while Nginx config uses an external IP → address mismatch.

Firewall blocks the connection.

Directly test backend service

# Test PHP‑FPM (using cgi‑fcgi)
cgi-fcgi -bind -connect 127.0.0.1:9000

# Test HTTP backend
curl -I http://127.0.0.1:8080/test

# Test uwsgi
uwsgi --http :8080 --wsgi-file test.py

Real case: A 502 occurred while the PHP‑FPM process existed. telnet 127.0.0.1 9000 immediately closed the connection. Logs revealed PHP‑FPM had been killed by the OOM killer and failed to restart.

Step 2: Check Nginx‑backend connection configuration

Inspect upstream configuration

# View Nginx config
nginx -T | grep -A 10 "upstream"

# Typical upstream block
upstream backend {
    server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.10:8080 backup;
}

Common configuration problems:

Address typo (e.g., fastcgi_pass 127.0.0.1:900 instead of 9000).

Wrong Unix socket path for FastCGI.

Permission issues on the socket file – ensure the nginx user can read/write.

Check timeout settings

# HTTP proxy timeout
location /api {
    proxy_pass http://backend;
    proxy_connect_timeout 10s;
    proxy_read_timeout 60s;
    proxy_send_timeout 60s;
}

# FastCGI timeout
location ~ \.php$ {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_connect_timeout 10s;
    fastcgi_read_timeout 300s;
    fastcgi_send_timeout 60s;
}

Judgment criteria:

If the error log shows "upstream timed out", increase the timeout values.

If a script legitimately takes long, raise fastcgi_read_timeout or optimise the code.

Step 3: Analyse Nginx error logs

Nginx error logs usually point directly to the cause.

Error log location

# Find error_log directive
nginx -T | grep error_log

# Common paths:
/var/log/nginx/error.log
/usr/local/nginx/logs/error.log

# Real‑time monitoring
tail -f /var/log/nginx/error.log
grep "502" /var/log/nginx/error.log | tail -50

Typical error messages and solutions

1. "connect() failed (111: Connection refused)"

Cause: backend not started or port not listening. Solution: start the backend and verify the port.

2. "connect() failed (113: No route to host)"

Cause: network unreachable, often due to firewall rules. Solution: check iptables/firewalld and temporarily disable for testing.

3. "recv() failed (104: Connection reset by peer)"

Cause: backend crashed, hit connection limit, or sent malformed data. Solution: inspect backend logs.

4. "upstream prematurely closed connection"

Cause: backend killed (OOM, max_requests) or script timeout. Solution: adjust pm.max_children, increase max_execution_time, and review OOM logs.

5. "no live upstreams"

Cause: all upstream servers marked unavailable. Solution: verify health status and adjust health‑check parameters.

6. "upstream sent too big header"

# Increase buffer sizes for proxy
location /api {
    proxy_buffer_size 16k;
    proxy_buffers 8 16k;
    proxy_busy_buffers_size 32k;
}
# For FastCGI
location ~ \.php$ {
    fastcgi_buffer_size 32k;
    fastcgi_buffers 8 32k;
}

Step 4: Check backend resource and load

Even if the service is up, resource exhaustion can cause 502.

Inspect process‑pool status (PHP‑FPM example)

# Enable status page in /etc/php-fpm.d/www.conf
# pm.status_path = /status

curl http://127.0.0.1/status?full   # shows active, idle, total processes

Key configuration:

pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 500

Diagnosis: if active processes constantly equals pm.max_children, increase the limit or optimise the application.

Check system resources

# CPU & memory
top -p <PID>

# File descriptors
lsof -p <PID> | wc -l
cat /proc/<PID>/limits | grep "open files"

# System limits
ulimit -n

# Connection count
netstat -antp | grep <PID> | wc -l

Typical problems: OOM kills, too many open files, connection limit reached.

Step 5: Check network and connections

Inspect connection queues

# View recv/send queue
ss -tn | grep :8080

# Analyse TCP state distribution
ss -antp | grep :8080 | awk '{print $1}' | sort | uniq -c

Check firewall and SELinux

# iptables rules
iptables-save | grep 8080

# firewalld
firewall-cmd --list-all

# SELinux
getenforce
ausearch -m avc -ts recent | grep nginx
setenforce 0   # temporary test

Common SELinux fixes:

setsebool -P httpd_can_network_connect 1
setsebool -P httpd_can_network_connect_db 1

Check Nginx‑backend connection count

# Current connections from Nginx to backend
netstat -antp | grep nginx | grep :8080 | wc -l

# If high, adjust keepalive
upstream backend {
    server 127.0.0.1:8080;
    keepalive 32;
}
location /api {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";  # clear Connection header
}

Step 6: Special scenarios

Intermittent 502

Symptoms: 502 appears sporadically.

Possible causes: periodic backend restarts, traffic spikes, health‑check glitches.

# Continuous monitoring
watch -n 1 "ps aux | grep php-fpm | wc -l"

# Record 502 timestamps
while true; do
    curl -s -o /dev/null -w "%{http_code}
" http://example.com/test >> http_status.log
    echo $(date) >> http_status.log
    sleep 1
done

grep "502" http_status.log

Specific request 502

Symptoms: only certain URLs or large requests return 502.

Possible causes: request body too large, code path crashes, insufficient timeout.

# Increase body size limit
client_max_body_size 20M;

# Increase timeout for heavy reports
location /api/report {
    proxy_pass http://backend;
    proxy_read_timeout 300s;
}

Post‑upgrade or restart 502

Possible causes: configuration errors, port/path changes, firewall reset.

# Compare configs
diff /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak

# Test syntax
nginx -t

# Find recent changes
find /etc/nginx -type f -mtime -1

Practical case: full incident investigation

Background

An e‑commerce site experienced 502 errors on product pages during a promotion, with traffic three times the normal level.

Investigation timeline

15:02 – Alert

# Check Nginx error log
tail -100 /var/log/nginx/error.log | grep 502

Found “upstream prematurely closed connection” from PHP‑FPM.

15:03 – Check PHP‑FPM status

ps aux | grep php-fpm | wc -l   # 51
grep "pm.max_children" /etc/php-fpm.d/www.conf   # 50
# Process count had reached the limit

15:04 – Temporary scaling

# Increase max_children to 100
vim /etc/php-fpm.d/www.conf   # set pm.max_children = 100
systemctl restart php-fpm

Process count grew, 502 stopped.

Root cause

Traffic surge hit pm.max_children limit.

New requests could not obtain a free process, causing Nginx timeout.

Insufficient capacity planning.

Long‑term optimisation

# Capacity formula
pm.max_children = (available_memory - reserve) / avg_php_process_memory
# Example: 16 GB RAM, reserve 4 GB, avg 50 MB → 245 children

# Enable status page for monitoring
pm.status_path = /fpm-status

# Alert when active / max > 0.8
# (Prometheus rule shown later)

# Optimise PHP code, use APCu/Redis caching, fix slow queries

# Consider ondemand mode for low traffic
pm = ondemand
pm.max_children = 100
pm.process_idle_timeout = 10s

Best practices and prevention

Nginx configuration optimisation

upstream backend {
    server 127.0.0.1:9000 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:9001 backup;
    keepalive 32;
}
server {
    listen 80;
    server_name example.com;

    error_log /var/log/nginx/example.com_error.log info;

    location ~ \.php$ {
        fastcgi_pass backend;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;

        fastcgi_connect_timeout 10s;
        fastcgi_send_timeout 60s;
        fastcgi_read_timeout 60s;

        fastcgi_buffer_size 32k;
        fastcgi_buffers 8 32k;
        fastcgi_busy_buffers_size 64k;

        fastcgi_intercept_errors on;
        error_page 502 503 504 /50x.html;
    }
    location = /50x.html {
        root /usr/share/nginx/html;
    }
}

Monitoring and alerting

# Prometheus alert example
groups:
  - name: nginx_alerts
    rules:
      - alert: Nginx502Rate
        expr: rate(nginx_http_requests_total{status="502"}[5m]) > 10
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Nginx 502 error rate too high"
      - alert: PHPFPMProcessNearLimit
        expr: phpfpm_active_processes / phpfpm_max_children > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PHP‑FPM process count near limit"

Emergency response workflow

1. Immediate backend check (1 min)
   ├─ Process running?
   ├─ Port listening?
   └─ Resources exhausted?
2. Review Nginx error log (1 min)
   └─ Identify specific error
3. Quick mitigation (2‑5 min)
   ├─ Restart backend
   ├─ Temporary scaling
   └─ Switch to backup
4. Monitor recovery (continuous)
   └─ Verify 502 no longer appears
5. Post‑mortem (same day)
   ├─ Record timeline
   ├─ Analyse root cause
   └─ Define preventive actions

Common tools and scripts

#!/bin/bash
# nginx_502_diagnosis.sh – quick 502 diagnostic report

echo "===== Nginx 502 Diagnosis Report ====="
echo "Generated: $(date)"

echo "
1. Recent 502 errors:"
grep "502\|upstream" /var/log/nginx/error.log | tail -50

echo "
2. Backend process status:"
ps aux | grep -E "(php-fpm|java|node|uwsgi)" | grep -v grep

echo "
3. Backend listening ports:"
netstat -tlnp | grep -E "(9000|8080|8000|3000)"

echo "
4. PHP‑FPM process count (if applicable):"
ps aux | grep php-fpm | wc -l
echo "Configured max children:"
grep "pm.max_children" /etc/php-fpm.d/*.conf 2>/dev/null

echo "
5. System resources:"
free -h
uptime

echo "
6. Recent restarts:"
journalctl -u nginx -u php-fpm -u tomcat --since "1 hour ago" | grep -i "start\|stop\|restart"

Conclusion

Nginx 502 errors are common, but a systematic checklist can resolve over 90 % of cases. Key takeaways:

Understand the communication mechanism.

Layered troubleshooting from backend to network.

Leverage logs for direct clues.

Plan capacity based on expected load.

Implement proactive monitoring.

Remember, 502 is not scary; lacking a methodical approach is.

backendMonitoringoperationsTroubleshootingNginx502Bad Gateway
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.