Operations 33 min read

Mastering 502, 503, and 504 Errors: Deep Dive and Practical Troubleshooting Guide

This comprehensive guide explains the HTTP 5xx status code hierarchy, details the specific triggers and root causes of 502 Bad Gateway, 503 Service Unavailable, and 504 Gateway Timeout, and provides step‑by‑step diagnostic flowcharts, real‑world case studies, and ready‑to‑run scripts for rapid resolution and proactive monitoring.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering 502, 503, and 504 Errors: Deep Dive and Practical Troubleshooting Guide

1. HTTP Status Code System

The HTTP status code space is divided into five classes: 1xx informational, 2xx success, 3xx redirection, 4xx client error, and 5xx server error. This guide focuses on the three most common 5xx codes: 502, 503, and 504.

1.1 HTTP Status Code Classification

HTTP status code structure:
    1xx - Informational
    2xx - Success
    3xx - Redirection
    4xx - Client error
    5xx - Server error

Key 5xx codes:
    502 Bad Gateway
    503 Service Unavailable
    504 Gateway Timeout

1.2 Common Traits of 5xx Errors

# Nginx logging for detailed 5xx information
log_format detailed '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/detailed.log detailed;

# Custom error page
error_page 502 503 504 /50x.html;
location = /50x.html {
    root /usr/share/nginx/html;
    internal;
}

1.3 Relationship Between Error Codes and Protocol Layers

┌─────────────────────────────┐
            │        HTTP Layer (App)      │
            │  Handles request, status,    │
            │  cache control, etc.         │
            └─────────────────────────────┘
                       ▲
                       │ Protocol parsing
 ┌─────────────────────┴───────────────────────┐
 │               Proxy / Gateway Layer          │
 │ Nginx receives client request, forwards   │
 │ to upstream, returns 502/503/504 as needed│
 └─────────────────────────────────────────────┘
                       ▲
                       │ Forward request
 ┌─────────────────────┴───────────────────────┐
 │               Upstream (Backend) Layer      │
 │ PHP‑FPM, Node.js, Python uWSGI, Java Tomcat│
 │ May return 500 or other codes             │
 └─────────────────────────────────────────────┘

2. 502 Bad Gateway Deep Dive

2.1 Definition

502 Bad Gateway : The gateway or proxy server received an invalid response from the upstream server.

Client → Nginx → PHP‑FPM
   |          |          |
   | ──────── GET / ──────► |
   |          |          |
   |          | ◄────── No response (connection refused) ◄─ |
   |          |          |
   | ◄──── 502 Bad Gateway ────── |

2.2 Typical Trigger Scenarios

Scenario 1: Backend service not started

# Check PHP‑FPM status
systemctl status php-fpm
ps aux | grep php-fpm

# If not running
sudo systemctl start php-fpm
sudo systemctl enable php-fpm

Scenario 2: Wrong backend port

# Nginx upstream configuration
upstream backend {
    server 127.0.0.1:9000;   # correct port
    # server 127.0.0.1:9001; # wrong port (service not listening)
}

server {
    listen 80;
    server_name example.com;
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
    location ~ \.php$ {
        fastcgi_pass 127.0.0.1:9000; # verify correct port
        fastcgi_index index.php;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    }
}

Scenario 3: Backend service crash

# View PHP‑FPM error log
tail -100 /var/log/php-fpm/error.log
# Check process status
ps aux | grep php-fpm
# If workers exist but no response, workers may be exhausted

Scenario 4: Connection limit exhausted

# PHP‑FPM pool configuration
[www]
pm = dynamic
pm.max_children = 50      # max child processes
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 500    # recycle after 500 requests

2.3 Diagnostic Flowchart

502 error occurs
    │
    ├── Step 1: Verify Nginx can reach backend
    │   ├── telnet 127.0.0.1 9000
    │   ├── nc -zv 127.0.0.1 9000
    │   └── ss -tlnp | grep 9000
    │
    ├── Step 2: Check backend service status
    │   ├── systemctl status php-fpm
    │   ├── ps aux | grep php-fpm
    │   └── ss -tlnp | grep :9000
    │
    ├── Step 3: Inspect backend resources
    │   ├── /var/log/php-fpm/error.log
    │   ├── dmesg | tail
    │   └── free -h
    │
    └── Step 4: Review Nginx logs
        ├── /var/log/nginx/error.log
        └── /var/log/nginx/access.log (rt field)

2.4 Practical 502 Troubleshooting Script

#!/bin/bash
# check_502.sh – quick 502 diagnosis

echo "=========================================="
echo "          502 Error Diagnosis"
echo "=========================================="

# 1. Nginx status
echo ""
echo "[1] Nginx service status"
systemctl is-active nginx && echo "✓ Nginx running" || echo "✗ Nginx not running"
ss -tlnp | grep :80 | head -5

# 2. Backend service status
echo ""
echo "[2] PHP‑FPM service status"
systemctl is-active php-fpm && echo "✓ PHP‑FPM running" || echo "✗ PHP‑FPM not running"
ps aux | grep -E "php-fpm|php-cgi" | grep -v grep | head -5

# 3. Port listening
echo ""
echo "[3] Port listening status"
ss -tlnp | grep -E ":80|:9000|:9001|:8080" | head -10

# 4. Connection test
echo ""
echo "[4] Backend connection test"
timeout 3 bash -c "echo > /dev/tcp/127.0.0.1/9000" 2>/dev/null && echo "✓ 127.0.0.1:9000 reachable" || echo "✗ 127.0.0.1:9000 unreachable"
timeout 3 bash -c "echo > /dev/tcp/127.0.0.1/9001" 2>/dev/null && echo "✓ 127.0.0.1:9001 reachable" || echo "✗ 127.0.0.1:9001 unreachable"

# 5. Resource usage
echo ""
echo "[5] Resource usage"
free -h | grep Mem
df -h / | tail -1

# 6. Nginx error log (last 10 lines)
echo ""
echo "[6] Recent Nginx 502 errors"
grep -A2 "502" /var/log/nginx/error.log 2>/dev/null | tail -20

# 7. PHP‑FPM error log (last 10 lines)
echo ""
echo "[7] Recent PHP‑FPM errors"
tail -10 /var/log/php-fpm/error.log 2>/dev/null || tail -10 /var/log/php-fpm/www-error.log 2>/dev/null

echo "=========================================="
echo "            Diagnosis Complete"
echo "=========================================="

3. 503 Service Unavailable Deep Dive

3.1 Definition

503 Service Unavailable : The server is temporarily unable to handle the request, often due to overload or maintenance.

Client → Nginx → Backend Service
   |          |                |
   | ──────── GET / ──────► |
   |          |                |
   |          | ◄────── 503 (service unavailable)
   |          |                |
   | ◄──── 503 Service Unavailable ◄─ |

3.2 Typical Trigger Scenarios

Scenario 1: Backend deliberately returns 503

# Nginx rate‑limit configuration that yields 503
limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

server {
    listen 80;
    server_name example.com;
    location / {
        limit_req zone=one burst=20 nodelay;
        proxy_pass http://backend;
    }
    error_page 503 /503.html;
    location = /503.html {
        root /usr/share/nginx/html;
        internal;
    }
}

Scenario 2: Maintenance mode

# Maintenance switch
set $maintenance false;
if (-f /var/www/maintenance.html) {
    set $maintenance true;
}

server {
    listen 80;
    server_name example.com;
    if ($maintenance = true) {
        return 503;
    }
    location / { proxy_pass http://backend; }
    error_page 503 @maintenance;
    location @maintenance {
        root /var/www;
        rewrite ^(.*)$ /maintenance.html break;
    }
}

Scenario 3: Connection limit reached

# Nginx connection limiting
limit_conn_zone $binary_remote_addr zone=addr:10m;

server {
    listen 80;
    location / {
        limit_conn addr 10;  # max 10 connections per IP
        proxy_pass http://backend;
    }
}

Scenario 4: Backend overload

# Check backend load
ss -ant | grep :8080 | wc -l
# View PHP‑FPM status page (requires status enabled)
cat /etc/php-fpm.d/www.conf | grep status
# Nginx status location
location ~ ^/(status|ping)$ {
    access_log off;
    allow 127.0.0.1;
    deny all;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
}
# Query status
curl http://127.0.0.1/status

3.3 503 and Rate‑Limiting Interaction

#!/bin/bash
# test_nginx_limit.sh – test Nginx rate limiting

echo "Testing Nginx rate‑limit configuration..."

echo ""
# Install ab if missing
which ab || sudo dnf install httpd-tools -y

echo "=== Normal request test ==="
curl -I http://localhost/ 2>/dev/null | head -1

echo ""
echo "=== Rate‑limit test (20 concurrency, 50 requests) ==="
ab -n 50 -c 20 http://localhost/

echo ""
echo "=== Limit log inspection ==="
tail -20 /var/log/nginx/error.log | grep -i limit || echo "No limit logs"

echo ""
echo "=== Response code statistics ==="
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

3.4 503 Diagnostic Flowchart

503 error occurs
    │
    ├── Step 1: Determine if Nginx or backend returned 503
    │   ├── Check response headers (curl -I)
    │
    ├── Step 2: If Nginx returned 503
    │   ├── Review limit_req configuration
    │   ├── Review limit_conn configuration
    │   └── Check maintenance flag
    │
    └── Step 3: If backend returned 503
        ├── Verify backend overload
        ├── Inspect backend logs
        └── Check backend resource health

4. 504 Gateway Timeout Deep Dive

4.1 Definition

504 Gateway Timeout : The gateway or proxy did not receive a timely response from the upstream server.

Client → Nginx → Backend Service
   |          |                |
   | ──────── GET / ──────► |
   |          |                |
   |          |   (waiting…)
   |          |                |
   |          |   ⏱ timeout!
   |          | ◄────── No response ◄─ |
   | ◄──── 504 Gateway Timeout ◄─ |

4.2 Typical Trigger Scenarios

Scenario 1: Backend processing takes too long

# Nginx timeout settings
server {
    listen 80;
    server_name example.com;

    # FastCGI timeouts
    fastcgi_connect_timeout 60s;
    fastcgi_send_timeout    60s;
    fastcgi_read_timeout   60s;

    # Proxy timeouts
    proxy_connect_timeout 60s;
    proxy_send_timeout    60s;
    proxy_read_timeout    60s;

    location / {
        fastcgi_pass 127.0.0.1:9000;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }
}

Scenario 2: Slow query causing PHP‑FPM timeout

# PHP‑FPM timeout configuration (/etc/php-fpm.d/www.conf)
request_terminate_timeout = 30s   # per‑request timeout
request_slowlog_timeout   = 10s   # slow‑log threshold

# View slow‑log
tail -50 /var/log/php-fpm/www-slow.log

Scenario 3: Database connection timeout

<?php
$conn = new mysqli("localhost", "user", "pass", "db");
$conn->options(MYSQLI_OPT_CONNECT_TIMEOUT, 5);
$conn->options(MYSQLI_OPT_READ_TIMEOUT, 30);
$conn->options(MYSQLI_OPT_WRITE_TIMEOUT, 30);
$result = $conn->query("SELECT * FROM large_table");
?>

Scenario 4: Nginx waiting for backend response

# Upstream definition with extended timeouts for large uploads
upstream backend {
    server 127.0.0.1:8080;
    keepalive 32;
}

server {
    listen 80;
    server_name api.example.com;

    # API timeout (long)
    proxy_connect_timeout 300s;
    proxy_send_timeout    300s;
    proxy_read_timeout    300s;
    client_max_body_size 100m;
    proxy_read_timeout    600s;   # for very large files

    location /api/ {
        proxy_pass http://backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
    }
}

4.3 504 Diagnostic Flowchart

504 error occurs
    │
    ├── Step 1: Identify which timeout triggered
    │   ├── Nginx → backend: proxy_read_timeout
    │   ├── FastCGI: fastcgi_read_timeout
    │   └── Backend PHP: max_execution_time
    │
    ├── Step 2: Review backend logs
    │   ├── PHP‑FPM slow log
    │   ├── Application logs
    │   └── Database slow‑query log
    │
    ├── Step 3: Check backend performance
    │   ├── CPU usage
    │   ├── Memory usage
    │   └── DB connection pool
    │
    └── Step 4: Optimisation suggestions
        ├── Increase timeout values
        ├── Optimise backend code
        └── Use asynchronous processing

4.4 Consolidated Timeout Configuration Example

# /etc/nginx/nginx.conf

# Global timeout settings
proxy_connect_timeout 60s;
proxy_send_timeout    60s;
proxy_read_timeout    60s;

# FastCGI global settings
fastcgi_connect_timeout 60s;
fastcgi_send_timeout    60s;
fastcgi_read_timeout   60s;
fastcgi_buffering_timeout 60s;

# uWSGI timeout settings
uwsgi_connect_timeout 60s;
uwsgi_send_timeout    60s;
uwsgi_read_timeout    60s;

server {
    listen 80;
    server_name example.com;

    # Default location (short timeout)
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    # Static assets (very short timeout)
    location /static/ {
        proxy_pass http://static_backend;
        proxy_connect_timeout 10s;
        proxy_read_timeout 30s;
        expires 1d;
    }

    # API endpoints (long timeout)
    location /api/ {
        proxy_pass http://api_backend;
        proxy_connect_timeout 300s;
        proxy_read_timeout 300s;
    }

    # Upload endpoints (extra long timeout)
    location /upload/ {
        proxy_pass http://upload_backend;
        proxy_connect_timeout 600s;
        proxy_read_timeout 600s;
        client_max_body_size 500m;
    }

    # Custom error page for 5xx
    error_page 502 503 504 /50x.html;
    location = /50x.html {
        root /usr/share/nginx/html;
        internal;
    }
}

5. Comparison of the Three Errors

5.1 Core Differences Summary

502 Bad Gateway : Backend connection failure or crash.

503 Service Unavailable : Backend refuses service (rate‑limit, overload, maintenance).

504 Gateway Timeout : Backend response takes too long (slow queries, heavy processing).

6. Real‑World Troubleshooting Cases

Case 1 – 502 Caused by Backend Crash

Symptom : Intermittent 502 errors.

Investigation :

# Check Nginx error log for connection refusals
tail -100 /var/log/nginx/error.log | grep 502

# Verify PHP‑FPM status
systemctl status php-fpm

# Look for OOM events
dmesg | grep -i "out of memory"
free -h

# Review PHP‑FPM pool settings
cat /etc/php-fpm.d/www.conf | grep -E "^pm|^max_children|^request_terminate"

Root Cause : PHP‑FPM workers exhausted memory and were killed by the OOM killer.

Resolution :

# Restart PHP‑FPM temporarily
sudo systemctl start php-fpm

# Adjust PHP‑FPM pool
[www]
pm = dynamic
pm.max_children = 20
pm.start_servers = 3
pm.min_spare_servers = 2
pm.max_spare_servers = 5
pm.max_requests = 200

# Reduce memory limit
php_admin_value[memory_limit] = 128M

# Restart services
sudo systemctl restart php-fpm
sudo systemctl restart nginx

Case 2 – 504 Due to Slow Query

Symptom : API calls time out, returning 504.

Investigation :

# Search Nginx error log for timeout messages
grep 504 /var/log/nginx/error.log | tail -20

# Examine PHP‑FPM slow log
cat /var/log/php-fpm/www-slow.log

# Identify long‑running MySQL queries
mysql -u root -p -e "SHOW PROCESSLIST;"
mysql -u root -p -e "SHOW VARIABLES LIKE 'slow_query%';"
 tail -20 /var/log/mysql/slow.log

Root Cause : A full‑table scan on a large table without an index caused the query to exceed the timeout.

Resolution :

# Add pagination and proper indexes
<?php
$page = isset($_GET['page']) ? (int)$_GET['page'] : 1;
$perPage = 100;
$offset = ($page - 1) * $perPage;
$stmt = $conn->prepare("SELECT * FROM huge_table WHERE created_at < ? ORDER BY id LIMIT ? OFFSET ?");
$stmt->bind_param("sii", $date, $perPage, $offset);
$stmt->execute();
?>

# Create indexes
ALTER TABLE huge_table ADD INDEX idx_created_at (created_at);
ALTER TABLE huge_table ADD INDEX idx_created_at_id (created_at, id);

# Verify with EXPLAIN
EXPLAIN SELECT * FROM huge_table WHERE created_at < '2026-01-01' ORDER BY id LIMIT 100;

Case 3 – 503 Triggered by Aggressive Rate Limiting

Symptom : During a promotion, many users receive 503 errors.

Investigation :

# Check Nginx limit_req configuration
grep -r "limit_req" /etc/nginx/

# Look for limit‑exceeded logs
tail -100 /var/log/nginx/error.log | grep "limiting"

# Verify connection count
ss -ant | grep :80 | wc -l

# Review PHP‑FPM status page
curl http://127.0.0.1/status

Root Cause : The configured rate‑limit (10 r/s) was too low for the traffic spike.

Resolution :

# Increase rate limits
limit_req_zone $binary_remote_addr zone=one:100m rate=100r/s;
limit_req_zone $binary_remote_addr zone=api:50m rate=50r/s;

server {
    listen 80;
    server_name example.com;

    # General pages – higher limit
    location / {
        limit_req zone=one burst=200 nodelay;
        proxy_pass http://backend;
    }

    # API – stricter limit
    location /api/ {
        limit_req zone=api burst=50 nodelay;
        proxy_pass http://api_backend;
    }

    # Static assets – virtually no limit
    location /static/ {
        limit_req zone=one burst=500;
        proxy_pass http://static_backend;
        expires 7d;
        add_header Cache-Control "public";
    }
}

7. Monitoring and Alerting Configuration

7.1 Monitor 5xx Error Rate (Bash)

#!/bin/bash
# monitor_5xx.sh – alerts when 5xx rate exceeds threshold

LOG_FILE="/var/log/nginx/access.log"
ALERT_THRESHOLD=5   # percent

current_minute=$(date +"%d/%b/%Y:%H:%M")
total_requests=$(grep "$current_minute" "$LOG_FILE" | wc -l)
error_5xx=$(grep "$current_minute" "$LOG_FILE" | awk '$9 ~ /^5[0-9][0-9]$/' | wc -l)

if [ $total_requests -gt 0 ]; then
    error_rate=$(echo "scale=2; $error_5xx * 100 / $total_requests" | bc)
    echo "Total requests: $total_requests"
    echo "5xx errors: $error_5xx"
    echo "Error rate: ${error_rate}%"
    if (( $(echo "$error_rate > $ALERT_THRESHOLD" | bc -l) )); then
        echo "⚠️ Alert: 5xx error rate exceeds $ALERT_THRESHOLD%"
        # Integrate with Prometheus/Zabbix here
    fi
else
    echo "No requests in the current minute"
fi

7.2 Prometheus Alert Rules

# prometheus_5xx_alerts.yml

groups:
  - name: nginx_5xx_alerts
    rules:
      - alert: NginxHigh502ErrorRate
        expr: |
          sum(rate(nginx_http_requests_total{status=~"502"}[5m]))
          /
          sum(rate(nginx_http_requests_total[5m])) * 100 > 5
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Nginx 502 error rate too high"
          description: "502 error rate > 5%, current value: {{ $value }}%"

      - alert: NginxHigh503ErrorRate
        expr: |
          sum(rate(nginx_http_requests_total{status=~"503"}[5m]))
          /
          sum(rate(nginx_http_requests_total[5m])) * 100 > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Nginx 503 error rate too high"
          description: "503 error rate > 5%, current value: {{ $value }}%"

      - alert: NginxHigh504ErrorRate
        expr: |
          sum(rate(nginx_http_requests_total{status=~"504"}[5m]))
          /
          sum(rate(nginx_http_requests_total[5m])) * 100 > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Nginx 504 error rate too high"
          description: "504 error rate > 5%, current value: {{ $value }}%"

7.3 Zabbix Monitoring Template (Agent Config)

# /etc/zabbix/zabbix_agentd.d/nginx_status.conf
UserParameter=nginx.active_connections,curl -s http://localhost/status | grep 'Active connections:' | awk '{print $3}'
UserParameter=nginx.accepts,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $1}'
UserParameter=nginx.handled,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $2}'
UserParameter=nginx.requests,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $3}'
UserParameter=nginx.5xx_rate,grep -c ' 502 \| 503 \| 504 ' /var/log/nginx/access.log

8. Summary Checklist

Key Points for Each Error

502 Bad Gateway:
    • Problem: Backend cannot be reached
    • Causes: Service down, wrong port, backend crash
    • Checks: Verify backend status, ports, logs

503 Service Unavailable:
    • Problem: Backend refuses service
    • Causes: Rate limiting, overload, maintenance mode, worker exhaustion
    • Checks: Review limit_req/limit_conn, backend load, maintenance flag

504 Gateway Timeout:
    • Problem: Backend response too slow
    • Causes: Long processing, slow DB queries, large uploads
    • Checks: Backend logs, slow‑query logs, optimise code/config

Rapid Response Flow

Receive 5xx alert
    │
    ├── Immediate checks
    │   ├── Is Nginx running?
    │   ├── Are backend services up?
    │   └── Are required ports listening?
    │
    ├── Review logs
    │   ├── Nginx error.log
    │   ├── Backend service logs
    │   └── PHP‑FPM slow log (if applicable)
    │
    ├── Temporary actions
    │   ├── Restart backend service
    │   ├── Adjust timeout values
    │   └── Disable rate limiting temporarily
    │
    └── Root‑cause analysis
        ├── Analyse error logs
        ├── Inspect slow queries
        ├── Optimise configuration or code
        └── Enhance monitoring/alerting

Common Command Cheat Sheet

# Find 502 errors
grep 502 /var/log/nginx/error.log

# Find 503/504 errors
grep -E "503|504" /var/log/nginx/error.log

# Count total 5xx errors
awk '$9 ~ /^5[0-9][0-9]$/' /var/log/nginx/access.log | wc -l

# Check backend service status
systemctl status php-fpm
ps aux | grep php-fpm

# Verify port listening
ss -tlnp | grep :9000

# Test connectivity
nc -zv 127.0.0.1 9000
telnet 127.0.0.1 9000

# View PHP‑FPM slow log
tail -50 /var/log/php-fpm/www-slow.log
operationsHTTPTroubleshooting502504503
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.