Mastering 502, 503, and 504 Errors: Deep Dive and Practical Troubleshooting Guide
This comprehensive guide explains the HTTP 5xx status code hierarchy, details the specific triggers and root causes of 502 Bad Gateway, 503 Service Unavailable, and 504 Gateway Timeout, and provides step‑by‑step diagnostic flowcharts, real‑world case studies, and ready‑to‑run scripts for rapid resolution and proactive monitoring.
1. HTTP Status Code System
The HTTP status code space is divided into five classes: 1xx informational, 2xx success, 3xx redirection, 4xx client error, and 5xx server error. This guide focuses on the three most common 5xx codes: 502, 503, and 504.
1.1 HTTP Status Code Classification
HTTP status code structure:
1xx - Informational
2xx - Success
3xx - Redirection
4xx - Client error
5xx - Server error
Key 5xx codes:
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout1.2 Common Traits of 5xx Errors
# Nginx logging for detailed 5xx information
log_format detailed '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/detailed.log detailed;
# Custom error page
error_page 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
internal;
}1.3 Relationship Between Error Codes and Protocol Layers
┌─────────────────────────────┐
│ HTTP Layer (App) │
│ Handles request, status, │
│ cache control, etc. │
└─────────────────────────────┘
▲
│ Protocol parsing
┌─────────────────────┴───────────────────────┐
│ Proxy / Gateway Layer │
│ Nginx receives client request, forwards │
│ to upstream, returns 502/503/504 as needed│
└─────────────────────────────────────────────┘
▲
│ Forward request
┌─────────────────────┴───────────────────────┐
│ Upstream (Backend) Layer │
│ PHP‑FPM, Node.js, Python uWSGI, Java Tomcat│
│ May return 500 or other codes │
└─────────────────────────────────────────────┘2. 502 Bad Gateway Deep Dive
2.1 Definition
502 Bad Gateway : The gateway or proxy server received an invalid response from the upstream server.
Client → Nginx → PHP‑FPM
| | |
| ──────── GET / ──────► |
| | |
| | ◄────── No response (connection refused) ◄─ |
| | |
| ◄──── 502 Bad Gateway ────── |2.2 Typical Trigger Scenarios
Scenario 1: Backend service not started
# Check PHP‑FPM status
systemctl status php-fpm
ps aux | grep php-fpm
# If not running
sudo systemctl start php-fpm
sudo systemctl enable php-fpmScenario 2: Wrong backend port
# Nginx upstream configuration
upstream backend {
server 127.0.0.1:9000; # correct port
# server 127.0.0.1:9001; # wrong port (service not listening)
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000; # verify correct port
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
}
}Scenario 3: Backend service crash
# View PHP‑FPM error log
tail -100 /var/log/php-fpm/error.log
# Check process status
ps aux | grep php-fpm
# If workers exist but no response, workers may be exhaustedScenario 4: Connection limit exhausted
# PHP‑FPM pool configuration
[www]
pm = dynamic
pm.max_children = 50 # max child processes
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 500 # recycle after 500 requests2.3 Diagnostic Flowchart
502 error occurs
│
├── Step 1: Verify Nginx can reach backend
│ ├── telnet 127.0.0.1 9000
│ ├── nc -zv 127.0.0.1 9000
│ └── ss -tlnp | grep 9000
│
├── Step 2: Check backend service status
│ ├── systemctl status php-fpm
│ ├── ps aux | grep php-fpm
│ └── ss -tlnp | grep :9000
│
├── Step 3: Inspect backend resources
│ ├── /var/log/php-fpm/error.log
│ ├── dmesg | tail
│ └── free -h
│
└── Step 4: Review Nginx logs
├── /var/log/nginx/error.log
└── /var/log/nginx/access.log (rt field)2.4 Practical 502 Troubleshooting Script
#!/bin/bash
# check_502.sh – quick 502 diagnosis
echo "=========================================="
echo " 502 Error Diagnosis"
echo "=========================================="
# 1. Nginx status
echo ""
echo "[1] Nginx service status"
systemctl is-active nginx && echo "✓ Nginx running" || echo "✗ Nginx not running"
ss -tlnp | grep :80 | head -5
# 2. Backend service status
echo ""
echo "[2] PHP‑FPM service status"
systemctl is-active php-fpm && echo "✓ PHP‑FPM running" || echo "✗ PHP‑FPM not running"
ps aux | grep -E "php-fpm|php-cgi" | grep -v grep | head -5
# 3. Port listening
echo ""
echo "[3] Port listening status"
ss -tlnp | grep -E ":80|:9000|:9001|:8080" | head -10
# 4. Connection test
echo ""
echo "[4] Backend connection test"
timeout 3 bash -c "echo > /dev/tcp/127.0.0.1/9000" 2>/dev/null && echo "✓ 127.0.0.1:9000 reachable" || echo "✗ 127.0.0.1:9000 unreachable"
timeout 3 bash -c "echo > /dev/tcp/127.0.0.1/9001" 2>/dev/null && echo "✓ 127.0.0.1:9001 reachable" || echo "✗ 127.0.0.1:9001 unreachable"
# 5. Resource usage
echo ""
echo "[5] Resource usage"
free -h | grep Mem
df -h / | tail -1
# 6. Nginx error log (last 10 lines)
echo ""
echo "[6] Recent Nginx 502 errors"
grep -A2 "502" /var/log/nginx/error.log 2>/dev/null | tail -20
# 7. PHP‑FPM error log (last 10 lines)
echo ""
echo "[7] Recent PHP‑FPM errors"
tail -10 /var/log/php-fpm/error.log 2>/dev/null || tail -10 /var/log/php-fpm/www-error.log 2>/dev/null
echo "=========================================="
echo " Diagnosis Complete"
echo "=========================================="3. 503 Service Unavailable Deep Dive
3.1 Definition
503 Service Unavailable : The server is temporarily unable to handle the request, often due to overload or maintenance.
Client → Nginx → Backend Service
| | |
| ──────── GET / ──────► |
| | |
| | ◄────── 503 (service unavailable)
| | |
| ◄──── 503 Service Unavailable ◄─ |3.2 Typical Trigger Scenarios
Scenario 1: Backend deliberately returns 503
# Nginx rate‑limit configuration that yields 503
limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;
server {
listen 80;
server_name example.com;
location / {
limit_req zone=one burst=20 nodelay;
proxy_pass http://backend;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/html;
internal;
}
}Scenario 2: Maintenance mode
# Maintenance switch
set $maintenance false;
if (-f /var/www/maintenance.html) {
set $maintenance true;
}
server {
listen 80;
server_name example.com;
if ($maintenance = true) {
return 503;
}
location / { proxy_pass http://backend; }
error_page 503 @maintenance;
location @maintenance {
root /var/www;
rewrite ^(.*)$ /maintenance.html break;
}
}Scenario 3: Connection limit reached
# Nginx connection limiting
limit_conn_zone $binary_remote_addr zone=addr:10m;
server {
listen 80;
location / {
limit_conn addr 10; # max 10 connections per IP
proxy_pass http://backend;
}
}Scenario 4: Backend overload
# Check backend load
ss -ant | grep :8080 | wc -l
# View PHP‑FPM status page (requires status enabled)
cat /etc/php-fpm.d/www.conf | grep status
# Nginx status location
location ~ ^/(status|ping)$ {
access_log off;
allow 127.0.0.1;
deny all;
fastcgi_pass 127.0.0.1:9000;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
# Query status
curl http://127.0.0.1/status3.3 503 and Rate‑Limiting Interaction
#!/bin/bash
# test_nginx_limit.sh – test Nginx rate limiting
echo "Testing Nginx rate‑limit configuration..."
echo ""
# Install ab if missing
which ab || sudo dnf install httpd-tools -y
echo "=== Normal request test ==="
curl -I http://localhost/ 2>/dev/null | head -1
echo ""
echo "=== Rate‑limit test (20 concurrency, 50 requests) ==="
ab -n 50 -c 20 http://localhost/
echo ""
echo "=== Limit log inspection ==="
tail -20 /var/log/nginx/error.log | grep -i limit || echo "No limit logs"
echo ""
echo "=== Response code statistics ==="
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn3.4 503 Diagnostic Flowchart
503 error occurs
│
├── Step 1: Determine if Nginx or backend returned 503
│ ├── Check response headers (curl -I)
│
├── Step 2: If Nginx returned 503
│ ├── Review limit_req configuration
│ ├── Review limit_conn configuration
│ └── Check maintenance flag
│
└── Step 3: If backend returned 503
├── Verify backend overload
├── Inspect backend logs
└── Check backend resource health4. 504 Gateway Timeout Deep Dive
4.1 Definition
504 Gateway Timeout : The gateway or proxy did not receive a timely response from the upstream server.
Client → Nginx → Backend Service
| | |
| ──────── GET / ──────► |
| | |
| | (waiting…)
| | |
| | ⏱ timeout!
| | ◄────── No response ◄─ |
| ◄──── 504 Gateway Timeout ◄─ |4.2 Typical Trigger Scenarios
Scenario 1: Backend processing takes too long
# Nginx timeout settings
server {
listen 80;
server_name example.com;
# FastCGI timeouts
fastcgi_connect_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
# Proxy timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
location / {
fastcgi_pass 127.0.0.1:9000;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}Scenario 2: Slow query causing PHP‑FPM timeout
# PHP‑FPM timeout configuration (/etc/php-fpm.d/www.conf)
request_terminate_timeout = 30s # per‑request timeout
request_slowlog_timeout = 10s # slow‑log threshold
# View slow‑log
tail -50 /var/log/php-fpm/www-slow.logScenario 3: Database connection timeout
<?php
$conn = new mysqli("localhost", "user", "pass", "db");
$conn->options(MYSQLI_OPT_CONNECT_TIMEOUT, 5);
$conn->options(MYSQLI_OPT_READ_TIMEOUT, 30);
$conn->options(MYSQLI_OPT_WRITE_TIMEOUT, 30);
$result = $conn->query("SELECT * FROM large_table");
?>Scenario 4: Nginx waiting for backend response
# Upstream definition with extended timeouts for large uploads
upstream backend {
server 127.0.0.1:8080;
keepalive 32;
}
server {
listen 80;
server_name api.example.com;
# API timeout (long)
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
client_max_body_size 100m;
proxy_read_timeout 600s; # for very large files
location /api/ {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
}
}4.3 504 Diagnostic Flowchart
504 error occurs
│
├── Step 1: Identify which timeout triggered
│ ├── Nginx → backend: proxy_read_timeout
│ ├── FastCGI: fastcgi_read_timeout
│ └── Backend PHP: max_execution_time
│
├── Step 2: Review backend logs
│ ├── PHP‑FPM slow log
│ ├── Application logs
│ └── Database slow‑query log
│
├── Step 3: Check backend performance
│ ├── CPU usage
│ ├── Memory usage
│ └── DB connection pool
│
└── Step 4: Optimisation suggestions
├── Increase timeout values
├── Optimise backend code
└── Use asynchronous processing4.4 Consolidated Timeout Configuration Example
# /etc/nginx/nginx.conf
# Global timeout settings
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# FastCGI global settings
fastcgi_connect_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_buffering_timeout 60s;
# uWSGI timeout settings
uwsgi_connect_timeout 60s;
uwsgi_send_timeout 60s;
uwsgi_read_timeout 60s;
server {
listen 80;
server_name example.com;
# Default location (short timeout)
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Static assets (very short timeout)
location /static/ {
proxy_pass http://static_backend;
proxy_connect_timeout 10s;
proxy_read_timeout 30s;
expires 1d;
}
# API endpoints (long timeout)
location /api/ {
proxy_pass http://api_backend;
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
}
# Upload endpoints (extra long timeout)
location /upload/ {
proxy_pass http://upload_backend;
proxy_connect_timeout 600s;
proxy_read_timeout 600s;
client_max_body_size 500m;
}
# Custom error page for 5xx
error_page 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
internal;
}
}5. Comparison of the Three Errors
5.1 Core Differences Summary
502 Bad Gateway : Backend connection failure or crash.
503 Service Unavailable : Backend refuses service (rate‑limit, overload, maintenance).
504 Gateway Timeout : Backend response takes too long (slow queries, heavy processing).
6. Real‑World Troubleshooting Cases
Case 1 – 502 Caused by Backend Crash
Symptom : Intermittent 502 errors.
Investigation :
# Check Nginx error log for connection refusals
tail -100 /var/log/nginx/error.log | grep 502
# Verify PHP‑FPM status
systemctl status php-fpm
# Look for OOM events
dmesg | grep -i "out of memory"
free -h
# Review PHP‑FPM pool settings
cat /etc/php-fpm.d/www.conf | grep -E "^pm|^max_children|^request_terminate"Root Cause : PHP‑FPM workers exhausted memory and were killed by the OOM killer.
Resolution :
# Restart PHP‑FPM temporarily
sudo systemctl start php-fpm
# Adjust PHP‑FPM pool
[www]
pm = dynamic
pm.max_children = 20
pm.start_servers = 3
pm.min_spare_servers = 2
pm.max_spare_servers = 5
pm.max_requests = 200
# Reduce memory limit
php_admin_value[memory_limit] = 128M
# Restart services
sudo systemctl restart php-fpm
sudo systemctl restart nginxCase 2 – 504 Due to Slow Query
Symptom : API calls time out, returning 504.
Investigation :
# Search Nginx error log for timeout messages
grep 504 /var/log/nginx/error.log | tail -20
# Examine PHP‑FPM slow log
cat /var/log/php-fpm/www-slow.log
# Identify long‑running MySQL queries
mysql -u root -p -e "SHOW PROCESSLIST;"
mysql -u root -p -e "SHOW VARIABLES LIKE 'slow_query%';"
tail -20 /var/log/mysql/slow.logRoot Cause : A full‑table scan on a large table without an index caused the query to exceed the timeout.
Resolution :
# Add pagination and proper indexes
<?php
$page = isset($_GET['page']) ? (int)$_GET['page'] : 1;
$perPage = 100;
$offset = ($page - 1) * $perPage;
$stmt = $conn->prepare("SELECT * FROM huge_table WHERE created_at < ? ORDER BY id LIMIT ? OFFSET ?");
$stmt->bind_param("sii", $date, $perPage, $offset);
$stmt->execute();
?>
# Create indexes
ALTER TABLE huge_table ADD INDEX idx_created_at (created_at);
ALTER TABLE huge_table ADD INDEX idx_created_at_id (created_at, id);
# Verify with EXPLAIN
EXPLAIN SELECT * FROM huge_table WHERE created_at < '2026-01-01' ORDER BY id LIMIT 100;Case 3 – 503 Triggered by Aggressive Rate Limiting
Symptom : During a promotion, many users receive 503 errors.
Investigation :
# Check Nginx limit_req configuration
grep -r "limit_req" /etc/nginx/
# Look for limit‑exceeded logs
tail -100 /var/log/nginx/error.log | grep "limiting"
# Verify connection count
ss -ant | grep :80 | wc -l
# Review PHP‑FPM status page
curl http://127.0.0.1/statusRoot Cause : The configured rate‑limit (10 r/s) was too low for the traffic spike.
Resolution :
# Increase rate limits
limit_req_zone $binary_remote_addr zone=one:100m rate=100r/s;
limit_req_zone $binary_remote_addr zone=api:50m rate=50r/s;
server {
listen 80;
server_name example.com;
# General pages – higher limit
location / {
limit_req zone=one burst=200 nodelay;
proxy_pass http://backend;
}
# API – stricter limit
location /api/ {
limit_req zone=api burst=50 nodelay;
proxy_pass http://api_backend;
}
# Static assets – virtually no limit
location /static/ {
limit_req zone=one burst=500;
proxy_pass http://static_backend;
expires 7d;
add_header Cache-Control "public";
}
}7. Monitoring and Alerting Configuration
7.1 Monitor 5xx Error Rate (Bash)
#!/bin/bash
# monitor_5xx.sh – alerts when 5xx rate exceeds threshold
LOG_FILE="/var/log/nginx/access.log"
ALERT_THRESHOLD=5 # percent
current_minute=$(date +"%d/%b/%Y:%H:%M")
total_requests=$(grep "$current_minute" "$LOG_FILE" | wc -l)
error_5xx=$(grep "$current_minute" "$LOG_FILE" | awk '$9 ~ /^5[0-9][0-9]$/' | wc -l)
if [ $total_requests -gt 0 ]; then
error_rate=$(echo "scale=2; $error_5xx * 100 / $total_requests" | bc)
echo "Total requests: $total_requests"
echo "5xx errors: $error_5xx"
echo "Error rate: ${error_rate}%"
if (( $(echo "$error_rate > $ALERT_THRESHOLD" | bc -l) )); then
echo "⚠️ Alert: 5xx error rate exceeds $ALERT_THRESHOLD%"
# Integrate with Prometheus/Zabbix here
fi
else
echo "No requests in the current minute"
fi7.2 Prometheus Alert Rules
# prometheus_5xx_alerts.yml
groups:
- name: nginx_5xx_alerts
rules:
- alert: NginxHigh502ErrorRate
expr: |
sum(rate(nginx_http_requests_total{status=~"502"}[5m]))
/
sum(rate(nginx_http_requests_total[5m])) * 100 > 5
for: 2m
labels:
severity: critical
annotations:
summary: "Nginx 502 error rate too high"
description: "502 error rate > 5%, current value: {{ $value }}%"
- alert: NginxHigh503ErrorRate
expr: |
sum(rate(nginx_http_requests_total{status=~"503"}[5m]))
/
sum(rate(nginx_http_requests_total[5m])) * 100 > 5
for: 2m
labels:
severity: warning
annotations:
summary: "Nginx 503 error rate too high"
description: "503 error rate > 5%, current value: {{ $value }}%"
- alert: NginxHigh504ErrorRate
expr: |
sum(rate(nginx_http_requests_total{status=~"504"}[5m]))
/
sum(rate(nginx_http_requests_total[5m])) * 100 > 5
for: 2m
labels:
severity: warning
annotations:
summary: "Nginx 504 error rate too high"
description: "504 error rate > 5%, current value: {{ $value }}%"7.3 Zabbix Monitoring Template (Agent Config)
# /etc/zabbix/zabbix_agentd.d/nginx_status.conf
UserParameter=nginx.active_connections,curl -s http://localhost/status | grep 'Active connections:' | awk '{print $3}'
UserParameter=nginx.accepts,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $1}'
UserParameter=nginx.handled,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $2}'
UserParameter=nginx.requests,curl -s http://localhost/status | awk '/^[[:space:]]+[0-9]+ [0-9]+ [0-9]+/ {print $3}'
UserParameter=nginx.5xx_rate,grep -c ' 502 \| 503 \| 504 ' /var/log/nginx/access.log8. Summary Checklist
Key Points for Each Error
502 Bad Gateway:
• Problem: Backend cannot be reached
• Causes: Service down, wrong port, backend crash
• Checks: Verify backend status, ports, logs
503 Service Unavailable:
• Problem: Backend refuses service
• Causes: Rate limiting, overload, maintenance mode, worker exhaustion
• Checks: Review limit_req/limit_conn, backend load, maintenance flag
504 Gateway Timeout:
• Problem: Backend response too slow
• Causes: Long processing, slow DB queries, large uploads
• Checks: Backend logs, slow‑query logs, optimise code/configRapid Response Flow
Receive 5xx alert
│
├── Immediate checks
│ ├── Is Nginx running?
│ ├── Are backend services up?
│ └── Are required ports listening?
│
├── Review logs
│ ├── Nginx error.log
│ ├── Backend service logs
│ └── PHP‑FPM slow log (if applicable)
│
├── Temporary actions
│ ├── Restart backend service
│ ├── Adjust timeout values
│ └── Disable rate limiting temporarily
│
└── Root‑cause analysis
├── Analyse error logs
├── Inspect slow queries
├── Optimise configuration or code
└── Enhance monitoring/alertingCommon Command Cheat Sheet
# Find 502 errors
grep 502 /var/log/nginx/error.log
# Find 503/504 errors
grep -E "503|504" /var/log/nginx/error.log
# Count total 5xx errors
awk '$9 ~ /^5[0-9][0-9]$/' /var/log/nginx/access.log | wc -l
# Check backend service status
systemctl status php-fpm
ps aux | grep php-fpm
# Verify port listening
ss -tlnp | grep :9000
# Test connectivity
nc -zv 127.0.0.1 9000
telnet 127.0.0.1 9000
# View PHP‑FPM slow log
tail -50 /var/log/php-fpm/www-slow.logMaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
