How to Build a Zero‑Downtime High‑Availability Nginx Reverse Proxy Cluster with 3 Nodes
This step‑by‑step guide shows how to set up a three‑node Nginx reverse‑proxy cluster with Keepalived for VIP failover, covering prerequisites, installation, configuration, health‑check scripts, monitoring, performance tuning, security hardening, troubleshooting, and rollback procedures.
From Zero to High‑Availability Nginx Reverse Proxy Cluster
Applicable Scenarios and Prerequisites
Use case: traffic distribution for medium‑to‑large web services, multi‑datacenter HA, zero‑downtime updates.
Prerequisites: 3 Linux servers (RHEL 7.x+ or Ubuntu 18.04+), network connectivity, root or sudo access, a virtual IP (VIP).
Network requirements: isolated VLAN, UDP multicast (Keepalived VRRP), NTP time sync.
Performance expectation: a single Nginx instance handles 100k+ concurrent connections; the cluster can scale horizontally.
Quick Checklist
Step 1 : Prepare 3 servers and VIP, verify multicast.
Step 2 : Install Nginx and Keepalived packages.
Step 3 : Configure Master Keepalived (priority 100).
Step 4 : Configure Backup Keepalived nodes (priority 90, 80).
Step 5 : Configure Nginx upstream and virtual host.
Step 6 : Start services and verify VIP binding.
Step 7 : Test failover and recovery.
Step 8 : Set up monitoring, alerts, and log collection.
Step 9 : Define change and rollback strategy.
Implementation Steps
Step 1: Environment Preparation and Basic Checks
Check multicast support (required):
# RHEL/CentOS
sudo yum install -y net-tools
ip link show
# Ubuntu/Debian
sudo apt-get install -y net-tools
ip link showVerify NTP synchronization:
# Check time sync status
timedatectl status
# Manual sync
sudo ntpdate -u ntp.aliyun.comStep 2: Install Nginx and Keepalived
RHEL/CentOS installation:
# Enable EPEL
sudo yum install -y epel-release
# Install packages
sudo yum install -y nginx keepalived
# Enable on boot
sudo systemctl enable nginx keepalivedUbuntu/Debian installation:
# Update repo
sudo apt-get update
# Install packages
sudo apt-get install -y nginx keepalived
# Enable on boot
sudo systemctl enable nginx keepalivedVerify installation:
nginx -v
keepalived --versionStep 3: Configure Master Keepalived
Edit configuration (priority 100): sudo vi /etc/keepalived/keepalived.conf Sample Master configuration:
global_defs {
router_id NGINX_MASTER
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 3
weight -20
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass nginx_ha
}
virtual_ipaddress {
10.0.0.100/24
}
track_script { check_nginx }
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
}Key parameters: virtual_router_id 51: VRRP group ID (must be identical across nodes). priority 100: Master priority; backups use 90 and 80. advert_int 1: VRRP advertisement interval (seconds). authentication: Password to prevent unauthorized nodes. virtual_ipaddress: Floating VIP.
Create Nginx health‑check script:
sudo tee /etc/keepalived/check_nginx.sh > /dev/null <<'EOF'
#!/bin/bash
if systemctl is-active --quiet nginx; then
exit 0
else
systemctl start nginx
sleep 2
if systemctl is-active --quiet nginx; then
exit 0
else
exit 1
fi
fi
EOF
sudo chmod +x /etc/keepalived/check_nginx.shOptional notification script:
sudo tee /etc/keepalived/notify.sh > /dev/null <<'EOF'
#!/bin/bash
TYPE=$1
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] Keepalived state change to $TYPE" >> /var/log/keepalived-notify.log
EOF
sudo chmod +x /etc/keepalived/notify.shValidate configuration syntax:
sudo keepalived -tStep 4: Configure Backup Keepalived Nodes
Backup node example (priority 90):
global_defs {
router_id NGINX_BACKUP1
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 3
weight -20
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass nginx_ha
}
virtual_ipaddress { 10.0.0.100/24 }
track_script { check_nginx }
}Second backup (priority 80) – change priority 90 to priority 80 and router_id NGINX_BACKUP2.
Step 5: Configure Nginx Upstream and Virtual Host
On all three nodes edit /etc/nginx/nginx.conf and add:
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events { worker_connections 10000; use epoll; }
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'rt=$request_time uct=$upstream_connect_time '
'uht=$upstream_header_time urt=$upstream_response_time';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
client_max_body_size 100m;
upstream backend {
least_conn;
server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
listen 80 default_server;
server_name _;
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
proxy_connect_timeout 10s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
location /health {
access_log off;
return 200 "ok
";
add_header Content-Type text/plain;
}
}
}Key Nginx parameters: least_conn: least‑connections load‑balancing. max_fails=3 fail_timeout=30s: mark server down after 3 failures within 30 s. keepalive 32: upstream connection pool. proxy_set_header Connection "": keep connections reusable.
Validate Nginx configuration:
sudo nginx -tStep 6: Start Services and Verify VIP
sudo systemctl start nginx
sudo systemctl start keepalived
sudo systemctl status nginx
sudo systemctl status keepalived
ip addr show | grep 10.0.0.100Step 7: Test Failover
# From an external client
curl -v http://10.0.0.100/health
# Simulate Master failure
sudo systemctl stop nginx # on Master
# Verify VIP moves to Backup
ip addr show | grep 10.0.0.100 # on Backup
# Restore Master
sudo systemctl start nginxMonitoring and Alerting
Prometheus Exporter
Install nginx‑prometheus‑exporter and expose /nginx_status on 127.0.0.1:8080.
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
tar xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
sudo mv nginx-prometheus-exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/nginx-prometheus-exporter
nginx-prometheus-exporter -nginx.scrape-uri http://localhost/nginx_statusAdd the exporter to Prometheus scrape config and define alert rules for Nginx down and high error rate.
Native Linux Monitoring
watch -n 1 'ss -an | grep :80 | wc -l' # active connections
tail -f /var/log/messages | grep -i keepalived # Keepalived events
watch -n 1 'ip addr show | grep 10.0.0.100' # VIP statusPerformance and Capacity
Benchmark with wrk:
wrk -t4 -c100 -d30s --latency http://10.0.0.100/healthSample output shows ~2 ms average latency and >40k requests/sec, confirming the 10 万+ concurrent connection claim.
Tuning Parameters
System‑level (all nodes):
# /etc/sysctl.conf additions
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
sysctl -pNginx‑level:
worker_connections 10000;
use epoll;Security and Compliance
Strengthen Keepalived authentication:
authentication {
auth_type AH
auth_pass $(openssl rand -base64 12)
}Firewall rules (RHEL/CentOS):
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" protocol="vrrp" accept'
sudo firewall-cmd --reloadUbuntu/Debian UFW:
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow in proto vrrpCommon Issues and Troubleshooting
Symptom
Diagnostic Command
Root Cause
Quick Fix
Permanent Fix
VIP cannot bind ip addr show Multicast not enabled
Check switch IGMP configuration
Enable multicast or switch to unicast mode
Keepalived does not switch journalctl -u keepalived Health‑check script fails
Run script manually, fix permissions
Correct script path and permissions
VIP flapping tail -f /var/log/keepalived.log Network packet loss
Increase advert_int to 2‑3 s
Improve network reliability or adjust intervals
Nginx connection refused netstat -an | grep LISTEN Worker connection limit reached
Increase worker_connections Adjust system ulimit and sysctl limits
502 errors from backend curl -v http://backend:8080 Backend service unreachable
Check backend health
Fix backend or update upstream config
Change and Rollback Playbook
Gray‑Release Strategy
Mark a backend server as down, reload Nginx without downtime, update the server, then bring it back online.
# Mark server down
server 192.168.1.11:8080 down;
# Reload Nginx
sudo nginx -s reload
# After update, remove "down" and reload again
sudo nginx -t && sudo nginx -s reloadHealth‑Check Verification
for host in 192.168.1.10 192.168.1.11 192.168.1.12; do
curl -s http://$host:8080/health || echo "$host DOWN"
doneQuick Rollback
# Backup current config
sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup.$(date +%s)
# Restore previous version
sudo cp /etc/nginx/nginx.conf.backup.xxx /etc/nginx/nginx.conf
# Test and reload
sudo nginx -t && sudo nginx -s reloadBest Practices
VIP exclusivity : VIP should bind only on Master.
Precise health checks : Scripts must be idempotent and complete within 3 s.
Priority ladder : Master 100, Backup1 90, Backup2 80.
Consistent authentication : Same auth_pass on all nodes.
Network isolation : Keepalived traffic on a dedicated management network.
Three‑layer monitoring : VIP, Nginx process, backend health.
Regular failover drills : Quarterly tests to verify automatic switchover.
Configuration version control : Store Nginx and Keepalived configs in Git.
Log aggregation : Centralize Nginx access logs and Keepalived events.
Capacity planning : Single Nginx handles ~100k concurrent connections; consider scaling at ~50k.
Appendix: Full Configuration Samples
keepalived.conf (Master)
global_defs {
router_id NGINX_MASTER
script_user root
enable_script_security
vrrp_iptables NGINX_HA
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 3
weight -20
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass nginx_ha_2025
}
virtual_ipaddress { 10.0.0.100/24 }
virtual_routes { 0.0.0.0/0 via 10.0.0.1 }
track_script { check_nginx }
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}nginx.conf (Key Sections)
upstream backend {
least_conn;
server 192.168.1.10:8080 max_fails=3 fail_timeout=30s weight=1;
server 192.168.1.11:8080 max_fails=3 fail_timeout=30s weight=1;
server 192.168.1.12:8080 max_fails=3 fail_timeout=30s weight=1;
keepalive 32;
}
server {
listen 80 default_server;
server_name _;
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 10s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
location /health {
access_log off;
return 200 "ok
";
add_header Content-Type text/plain;
}
}Conclusion : This guide provides a complete, production‑ready three‑node high‑availability Nginx reverse‑proxy solution with Keepalived, covering installation, configuration, health checks, monitoring, performance tuning, security hardening, troubleshooting, and rollback procedures.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
