Operations 26 min read

Mastering Nginx 502/504 Errors: A Complete Troubleshooting Guide with Scripts

This comprehensive guide explains the differences between Nginx 502 and 504 errors, provides step‑by‑step troubleshooting procedures, detailed configuration examples, one‑click diagnostic scripts, real‑world case studies, best‑practice optimizations, monitoring setups, and advanced learning paths to help you quickly resolve gateway issues and improve server reliability.

Raymond Ops
Raymond Ops
Raymond Ops
Mastering Nginx 502/504 Errors: A Complete Troubleshooting Guide with Scripts

Overview

502 and 504 are the two most common Nginx gateway errors. 502 Bad Gateway means the backend service is unavailable, crashed, or returned invalid data. 504 Gateway Timeout means the backend responded too slowly and exceeded Nginx's timeout limits.

Applicable Scenarios

PHP‑FPM (LNMP stack)

Java / Go / Python services

Load‑balanced upstreams

WebSocket long‑connection scenarios

Environment Requirements

Nginx ≥ 1.14 (mainstream versions)

OS: CentOS 7+ / Ubuntu 18.04+

Backend: PHP‑FPM, Tomcat, custom services (examples focus on PHP‑FPM)

502 Diagnosis

Step 1 – Check Backend Service

# PHP‑FPM status
systemctl status php-fpm
ps aux | grep php-fpm | grep -v grep
# Verify listening socket (default 9000 for PHP‑FPM)
ss -tlnp | grep 9000
# If using a Unix socket
ls -la /run/php-fpm/www.sock

Step 2 – Inspect Nginx error.log

# Real‑time view
tail -f /var/log/nginx/error.log
# Filter 502‑related messages
grep -E "502|upstream|connect|failed" /var/log/nginx/error.log

Common error messages and meanings: connect() failed (111: Connection refused) – backend not started or wrong port. connect() failed (113: No route to host) – network unreachable. upstream prematurely closed connection – backend closed the connection (OOM, crash). no live upstreams – all upstream nodes are down.

Step 3 – PHP‑FPM Specific Checks (LNMP)

# View PHP‑FPM status page (must be enabled)
curl http://127.0.0.1/php-fpm-status
# Count active processes
ps aux | grep "php-fpm: pool" | grep -v grep | wc -l
# Check PHP‑FPM logs
tail -100 /var/log/php-fpm/www-error.log

Enable the status page in /etc/php-fpm.d/www.conf:

pm.status_path = /php-fpm-status

Step 4 – Connection & Resource Limits

# Open file descriptors
cat /proc/sys/fs/file-nr
# Nginx worker connections
ss -s
# Connections on a specific port (e.g., 9000)
ss -ant | grep :9000 | wc -l
# System limits
ulimit -n
cat /etc/security/limits.conf | grep -v "^#"

504 Diagnosis

Step 1 – Verify Timeout Settings

# Show all timeout directives in Nginx config
grep -r "timeout" /etc/nginx/ | grep -v "#"
# Typical timeout parameters (default 60s)
proxy_connect_timeout 60s;
proxy_read_timeout    60s;
proxy_send_timeout    60s;
fastcgi_connect_timeout 60s;
fastcgi_read_timeout    60s;
fastcgi_send_timeout    60s;

Step 2 – Analyse Backend Response Time

# Enable detailed logging in nginx.conf
log_format detailed '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/access.log detailed;

Key fields: $request_time – total time Nginx spent on the request. $upstream_connect_time – time to connect to the backend. $upstream_header_time – time to receive response headers. $upstream_response_time – time to receive the full response.

Step 3 – Identify Slow Backend Causes

# Find requests taking >5 seconds
awk '$NF > 5 {print $0}' /var/log/nginx/access.log | tail -20
# List top 20 slow URLs
awk -F'rt=' '{if(NF>1){split($2,a," ");if(a[1]>5)print $0}}' /var/log/nginx/access.log | sort -nr | head -20

Typical reasons for 504:

Slow SQL queries (enable MySQL slow‑query log).

External API latency.

Code dead‑loops or blocking operations.

Resource lock contention.

Timeout values too short.

Sample Configuration

Nginx Optimisation

# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;  # match CPU cores
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /run/nginx.pid;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;
    gzip on;
    gzip_min_length 1k;
    gzip_comp_level 4;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml;

    upstream php_backend {
        server unix:/run/php-fpm/www.sock;
        keepalive 16;
    }

    upstream api_backend {
        least_conn;
        server 192.168.1.10:8080 weight=5 max_fails=3 fail_timeout=30s;
        server 192.168.1.11:8080 weight=5 max_fails=3 fail_timeout=30s;
        keepalive 32;
    }

    include /etc/nginx/conf.d/*.conf;
}

PHP‑FPM Optimisation

# /etc/php-fpm.d/www.conf
[www]
user = nginx
group = nginx
listen = /run/php-fpm/www.sock
listen.owner = nginx
listen.group = nginx
listen.mode = 0660

pm = dynamic
pm.max_children = 100
pm.start_servers = 20
pm.min_spare_servers = 10
pm.max_spare_servers = 30
pm.max_requests = 500
pm.process_idle_timeout = 10s

pm.status_path = /php-fpm-status
ping.path = /php-fpm-ping
ping.response = pong

slowlog = /var/log/php-fpm/www-slow.log
request_slowlog_timeout = 3s
request_terminate_timeout = 120s

php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on

One‑Click Diagnosis Script

#!/bin/bash
# nginx_diagnose.sh – quick 502/504 check
# Usage: bash nginx_diagnose.sh

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

echo -e "${YELLOW}=== Nginx status ===${NC}"
if systemctl is-active --quiet nginx; then
    echo -e "${GREEN}[OK] Nginx running${NC}"
else
    echo -e "${RED}[ERROR] Nginx not running${NC}"
fi
nginx -t 2>&1 | head -5

echo -e "${YELLOW}=== Backend status ===${NC}"
if command -v php-fpm >/dev/null; then
    if systemctl is-active --quiet php-fpm; then
        echo -e "${GREEN}[OK] PHP‑FPM running${NC}"
        fpm_count=$(ps aux | grep "php-fpm: pool" | grep -v grep | wc -l)
        echo "    PHP‑FPM processes: $fpm_count"
    else
        echo -e "${RED}[ERROR] PHP‑FPM not running${NC}"
    fi
fi
for port in 9000 8080 3000 5000; do
    ss -tlnp | grep -q ":$port" && echo -e "${GREEN}[OK] Port $port listening${NC}"
done

echo -e "${YELLOW}=== Recent 502/504 errors ===${NC}"
if [ -f /var/log/nginx/error.log ]; then
    error_count=$(grep -c "502\|504\|upstream" /var/log/nginx/error.log 2>/dev/null || echo 0)
    echo "Recent error count: $error_count"
    echo "Last 10 errors:"
    grep -E "502|504|upstream|connect" /var/log/nginx/error.log | tail -10
fi

echo -e "${YELLOW}=== Connection statistics ===${NC}"
ss -s
ss -ant | awk 'NR>1 {print $1}' | sort | uniq -c | sort -rn

echo -e "${YELLOW}=== System resources ===${NC}"
echo "CPU usage:"; top -bn1 | head -5

echo "Memory usage:"; free -h

echo "Disk usage:"; df -h | grep -v tmpfs

echo -e "${YELLOW}=== Nginx timeout settings ===${NC}"
grep -r "timeout" /etc/nginx/ | grep -v "#" | head -20

echo -e "${YELLOW}=== Upstream snippets ===${NC}"
grep -r "upstream" /etc/nginx/ | grep -v "#" | head -20

echo -e "${YELLOW}=== Slow requests (>3s) ===${NC}"
if [ -f /var/log/nginx/access.log ]; then
    awk -F'rt=' '{if(NF>1){split($2,a," ");if(a[1]>3)print $0}}' /var/log/nginx/access.log | tail -10
fi

echo "========================================"
echo -e "${GREEN}Diagnosis complete. Review highlighted items above.${NC}"
echo "========================================"

Best Practices & Caveats

Do not set timeouts excessively long. A 5‑minute proxy_read_timeout hides performance problems; adjust to realistic limits.

Process count is not “more is better”. Each PHP‑FPM process consumes 20‑50 MiB; over‑provisioning can cause OOM.

Use nginx -s reload for configuration changes. Reload preserves existing connections; restart drops them.

Always test configuration with nginx -t before reloading.

Monitoring & Alerting

Enable stub_status and query http://127.0.0.1/nginx_status to monitor active, reading, writing, and waiting connections.

Prometheus alert examples for high 5xx rate and slow requests (P95 > 3 s):

# High 5xx error rate
- alert: NginxHighErrorRate
  expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) / rate(nginx_http_requests_total[5m]) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Nginx 5xx error rate too high"
    description: "5xx error rate exceeds 5% (current: {{ $value }})"

# Slow requests (P95 > 3s)
- alert: NginxSlowRequests
  expr: histogram_quantile(0.95, rate(nginx_http_request_duration_seconds_bucket[5m])) > 3
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Nginx request latency high"
    description: "P95 request latency > 3 seconds"

Quick Reference Commands

nginx -t

– test configuration. nginx -s reload – reload without dropping connections. systemctl status php-fpm – check PHP‑FPM service. tail -f /var/log/nginx/error.log – live error log. grep -E "502|504|upstream|connect" /var/log/nginx/error.log – filter gateway errors.

awk '$9==502 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

– top 502 URLs. ss -s – connection summary. ulimit -n – file‑descriptor limit.

Summary

502 = backend unavailable or exhausted. Typical causes: service crash, PHP‑FPM process pool full, socket permission errors, connection‑limit exhaustion.

504 = backend response exceeds Nginx timeout. Typical causes: slow SQL queries, external API latency, code dead‑loops, lock contention, timeout values too short.

Root‑cause analysis using error.log, access.log, backend status, and timeout settings resolves the majority of incidents without blind restarts or arbitrary timeout increases.

MonitoringPerformanceTroubleshootingNginxphp-fpm502504
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.