Operations 22 min read

How to Build a Zero‑Downtime High‑Availability Nginx Reverse Proxy Cluster with 3 Nodes

This step‑by‑step guide shows how to set up a three‑node Nginx reverse‑proxy cluster with Keepalived for VIP failover, covering prerequisites, installation, configuration, health‑check scripts, monitoring, performance tuning, security hardening, troubleshooting, and rollback procedures.

Ops Community
Ops Community
Ops Community
How to Build a Zero‑Downtime High‑Availability Nginx Reverse Proxy Cluster with 3 Nodes

From Zero to High‑Availability Nginx Reverse Proxy Cluster

Applicable Scenarios and Prerequisites

Use case: traffic distribution for medium‑to‑large web services, multi‑datacenter HA, zero‑downtime updates.

Prerequisites: 3 Linux servers (RHEL 7.x+ or Ubuntu 18.04+), network connectivity, root or sudo access, a virtual IP (VIP).

Network requirements: isolated VLAN, UDP multicast (Keepalived VRRP), NTP time sync.

Performance expectation: a single Nginx instance handles 100k+ concurrent connections; the cluster can scale horizontally.

Quick Checklist

Step 1 : Prepare 3 servers and VIP, verify multicast.

Step 2 : Install Nginx and Keepalived packages.

Step 3 : Configure Master Keepalived (priority 100).

Step 4 : Configure Backup Keepalived nodes (priority 90, 80).

Step 5 : Configure Nginx upstream and virtual host.

Step 6 : Start services and verify VIP binding.

Step 7 : Test failover and recovery.

Step 8 : Set up monitoring, alerts, and log collection.

Step 9 : Define change and rollback strategy.

Implementation Steps

Step 1: Environment Preparation and Basic Checks

Check multicast support (required):

# RHEL/CentOS
sudo yum install -y net-tools
ip link show
# Ubuntu/Debian
sudo apt-get install -y net-tools
ip link show

Verify NTP synchronization:

# Check time sync status
 timedatectl status
# Manual sync
 sudo ntpdate -u ntp.aliyun.com

Step 2: Install Nginx and Keepalived

RHEL/CentOS installation:

# Enable EPEL
 sudo yum install -y epel-release
# Install packages
 sudo yum install -y nginx keepalived
# Enable on boot
 sudo systemctl enable nginx keepalived

Ubuntu/Debian installation:

# Update repo
 sudo apt-get update
# Install packages
 sudo apt-get install -y nginx keepalived
# Enable on boot
 sudo systemctl enable nginx keepalived

Verify installation:

nginx -v
keepalived --version

Step 3: Configure Master Keepalived

Edit configuration (priority 100): sudo vi /etc/keepalived/keepalived.conf Sample Master configuration:

global_defs {
    router_id NGINX_MASTER
    script_user root
    enable_script_security
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 3
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass nginx_ha
    }
    virtual_ipaddress {
        10.0.0.100/24
    }
    track_script { check_nginx }
    notify_master "/etc/keepalived/notify.sh MASTER"
    notify_backup "/etc/keepalived/notify.sh BACKUP"
}

Key parameters: virtual_router_id 51: VRRP group ID (must be identical across nodes). priority 100: Master priority; backups use 90 and 80. advert_int 1: VRRP advertisement interval (seconds). authentication: Password to prevent unauthorized nodes. virtual_ipaddress: Floating VIP.

Create Nginx health‑check script:

sudo tee /etc/keepalived/check_nginx.sh > /dev/null <<'EOF'
#!/bin/bash
if systemctl is-active --quiet nginx; then
    exit 0
else
    systemctl start nginx
    sleep 2
    if systemctl is-active --quiet nginx; then
        exit 0
    else
        exit 1
    fi
fi
EOF
sudo chmod +x /etc/keepalived/check_nginx.sh

Optional notification script:

sudo tee /etc/keepalived/notify.sh > /dev/null <<'EOF'
#!/bin/bash
TYPE=$1
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] Keepalived state change to $TYPE" >> /var/log/keepalived-notify.log
EOF
sudo chmod +x /etc/keepalived/notify.sh

Validate configuration syntax:

sudo keepalived -t

Step 4: Configure Backup Keepalived Nodes

Backup node example (priority 90):

global_defs {
    router_id NGINX_BACKUP1
    script_user root
    enable_script_security
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 3
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass nginx_ha
    }
    virtual_ipaddress { 10.0.0.100/24 }
    track_script { check_nginx }
}

Second backup (priority 80) – change priority 90 to priority 80 and router_id NGINX_BACKUP2.

Step 5: Configure Nginx Upstream and Virtual Host

On all three nodes edit /etc/nginx/nginx.conf and add:

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events { worker_connections 10000; use epoll; }

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" '
                      'rt=$request_time uct=$upstream_connect_time '
                      'uht=$upstream_header_time urt=$upstream_response_time';
    access_log /var/log/nginx/access.log main;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    client_max_body_size 100m;

    upstream backend {
        least_conn;
        server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
        server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
        server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
        keepalive 32;
    }

    server {
        listen 80 default_server;
        server_name _;
        location / {
            proxy_pass http://backend;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header Connection "";
            proxy_connect_timeout 10s;
            proxy_send_timeout 30s;
            proxy_read_timeout 30s;
        }
        location /health {
            access_log off;
            return 200 "ok
";
            add_header Content-Type text/plain;
        }
    }
}

Key Nginx parameters: least_conn: least‑connections load‑balancing. max_fails=3 fail_timeout=30s: mark server down after 3 failures within 30 s. keepalive 32: upstream connection pool. proxy_set_header Connection "": keep connections reusable.

Validate Nginx configuration:

sudo nginx -t

Step 6: Start Services and Verify VIP

sudo systemctl start nginx
sudo systemctl start keepalived
sudo systemctl status nginx
sudo systemctl status keepalived
ip addr show | grep 10.0.0.100

Step 7: Test Failover

# From an external client
curl -v http://10.0.0.100/health
# Simulate Master failure
sudo systemctl stop nginx   # on Master
# Verify VIP moves to Backup
ip addr show | grep 10.0.0.100   # on Backup
# Restore Master
sudo systemctl start nginx

Monitoring and Alerting

Prometheus Exporter

Install nginx‑prometheus‑exporter and expose /nginx_status on 127.0.0.1:8080.

wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
 tar xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
 sudo mv nginx-prometheus-exporter /usr/local/bin/
 sudo chmod +x /usr/local/bin/nginx-prometheus-exporter
 nginx-prometheus-exporter -nginx.scrape-uri http://localhost/nginx_status

Add the exporter to Prometheus scrape config and define alert rules for Nginx down and high error rate.

Native Linux Monitoring

watch -n 1 'ss -an | grep :80 | wc -l'   # active connections
tail -f /var/log/messages | grep -i keepalived   # Keepalived events
watch -n 1 'ip addr show | grep 10.0.0.100'   # VIP status

Performance and Capacity

Benchmark with wrk:

wrk -t4 -c100 -d30s --latency http://10.0.0.100/health

Sample output shows ~2 ms average latency and >40k requests/sec, confirming the 10 万+ concurrent connection claim.

Tuning Parameters

System‑level (all nodes):

# /etc/sysctl.conf additions
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

sysctl -p

Nginx‑level:

worker_connections 10000;
use epoll;

Security and Compliance

Strengthen Keepalived authentication:

authentication {
    auth_type AH
    auth_pass $(openssl rand -base64 12)
}

Firewall rules (RHEL/CentOS):

sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" protocol="vrrp" accept'
sudo firewall-cmd --reload

Ubuntu/Debian UFW:

sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow in proto vrrp

Common Issues and Troubleshooting

Symptom

Diagnostic Command

Root Cause

Quick Fix

Permanent Fix

VIP cannot bind ip addr show Multicast not enabled

Check switch IGMP configuration

Enable multicast or switch to unicast mode

Keepalived does not switch journalctl -u keepalived Health‑check script fails

Run script manually, fix permissions

Correct script path and permissions

VIP flapping tail -f /var/log/keepalived.log Network packet loss

Increase advert_int to 2‑3 s

Improve network reliability or adjust intervals

Nginx connection refused netstat -an | grep LISTEN Worker connection limit reached

Increase worker_connections Adjust system ulimit and sysctl limits

502 errors from backend curl -v http://backend:8080 Backend service unreachable

Check backend health

Fix backend or update upstream config

Change and Rollback Playbook

Gray‑Release Strategy

Mark a backend server as down, reload Nginx without downtime, update the server, then bring it back online.

# Mark server down
server 192.168.1.11:8080 down;
# Reload Nginx
sudo nginx -s reload
# After update, remove "down" and reload again
sudo nginx -t && sudo nginx -s reload

Health‑Check Verification

for host in 192.168.1.10 192.168.1.11 192.168.1.12; do
  curl -s http://$host:8080/health || echo "$host DOWN"
done

Quick Rollback

# Backup current config
sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup.$(date +%s)
# Restore previous version
sudo cp /etc/nginx/nginx.conf.backup.xxx /etc/nginx/nginx.conf
# Test and reload
sudo nginx -t && sudo nginx -s reload

Best Practices

VIP exclusivity : VIP should bind only on Master.

Precise health checks : Scripts must be idempotent and complete within 3 s.

Priority ladder : Master 100, Backup1 90, Backup2 80.

Consistent authentication : Same auth_pass on all nodes.

Network isolation : Keepalived traffic on a dedicated management network.

Three‑layer monitoring : VIP, Nginx process, backend health.

Regular failover drills : Quarterly tests to verify automatic switchover.

Configuration version control : Store Nginx and Keepalived configs in Git.

Log aggregation : Centralize Nginx access logs and Keepalived events.

Capacity planning : Single Nginx handles ~100k concurrent connections; consider scaling at ~50k.

Appendix: Full Configuration Samples

keepalived.conf (Master)

global_defs {
    router_id NGINX_MASTER
    script_user root
    enable_script_security
    vrrp_iptables NGINX_HA
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 3
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass nginx_ha_2025
    }
    virtual_ipaddress { 10.0.0.100/24 }
    virtual_routes { 0.0.0.0/0 via 10.0.0.1 }
    track_script { check_nginx }
    notify_master "/etc/keepalived/notify.sh MASTER"
    notify_backup "/etc/keepalived/notify.sh BACKUP"
    notify_fault "/etc/keepalived/notify.sh FAULT"
}

nginx.conf (Key Sections)

upstream backend {
    least_conn;
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s weight=1;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s weight=1;
    server 192.168.1.12:8080 max_fails=3 fail_timeout=30s weight=1;
    keepalive 32;
}

server {
    listen 80 default_server;
    server_name _;
    location / {
        proxy_pass http://backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_connect_timeout 10s;
        proxy_send_timeout 30s;
        proxy_read_timeout 30s;
    }
    location /health {
        access_log off;
        return 200 "ok
";
        add_header Content-Type text/plain;
    }
}

Conclusion : This guide provides a complete, production‑ready three‑node high‑availability Nginx reverse‑proxy solution with Keepalived, covering installation, configuration, health checks, monitoring, performance tuning, security hardening, troubleshooting, and rollback procedures.

Monitoringhigh availabilityLoad BalancingNginxKeepalived
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.