Operations 33 min read

Dual‑Master Nginx + Keepalived Architecture: Eliminate Single Points of Failure

This guide walks through building a dual‑master Nginx + Keepalived high‑availability setup that doubles resource utilization, removes the idle‑backup drawback of traditional active‑passive designs, and provides step‑by‑step configuration, health‑check scripts, failover testing, best‑practice tips, and troubleshooting procedures.

Raymond Ops
Raymond Ops
Raymond Ops
Dual‑Master Nginx + Keepalived Architecture: Eliminate Single Points of Failure

Overview

Running a single Nginx instance is a reliability risk because hardware failures, network glitches, or kernel panics can bring the service down. The classic active‑passive Nginx + Keepalived model leaves the backup server idle, wasting resources. A dual‑master architecture solves this by having both servers serve traffic while still providing failover.

Technical Advantages

Resource utilization doubles – both nodes handle requests.

Fast failover – Keepalived switches VIPs within seconds.

Scalable – can be extended to larger clusters.

Cost‑effective – uses open‑source software instead of proprietary load balancers.

Applicable Scenarios

Small‑to‑medium websites (PV < 5 million)

Cost‑sensitive services that still need high availability

Internal gateways and API layers

Not recommended for ultra‑high traffic (use LVS + Keepalived) or for complex traffic scheduling (use dedicated ADC).

Environment Requirements

Two identical servers (recommended 8 CPU + 16 GB RAM)

OS: Rocky Linux 9.3 or Ubuntu 24.04 LTS

Nginx 1.26.2 (stable) or 1.27.3 (mainline)

Keepalived 2.3.1

Dual NICs (gigabit or 10 GbE)

Two VIPs in the same subnet

Preparation

Disable SELinux: setenforce 0 and edit /etc/selinux/config to set SELINUX=disabled.

Open firewall ports 80, 443 and the VRRP protocol.

firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --permanent --add-port=443/tcp
firewall-cmd --permanent --add-rich-rule='rule protocol value="vrrp" accept'
firewall-cmd --reload

Apply kernel parameters (e.g., enable non‑local bind, increase socket backlog, raise file‑descriptor limits) and reload:

cat > /etc/sysctl.d/99-nginx-keepalived.conf <<'EOF'
net.ipv4.ip_nonlocal_bind = 1
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_tw_buckets = 1440000
fs.file-max = 2097152
EOF
sysctl -p /etc/sysctl.d/99-nginx-keepalived.conf

Install Nginx

Method 1 – official repository (recommended)

# Add Nginx stable repo
cat > /etc/yum.repos.d/nginx.repo <<'EOF'
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
EOF

dnf install -y nginx

Method 2 – compile from source (required for custom modules)

# Install build dependencies
dnf install -y gcc make pcre2-devel openssl-devel zlib-devel libxml2-devel libxslt-devel gd-devel GeoIP-devel

# Download and extract source
cd /usr/local/src
wget https://nginx.org/download/nginx-1.26.2.tar.gz
tar xzf nginx-1.26.2.tar.gz
cd nginx-1.26.2

# Configure and build
./configure \
    --prefix=/etc/nginx \
    --sbin-path=/usr/sbin/nginx \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --http-log-path=/var/log/nginx/access.log \
    --pid-path=/run/nginx.pid \
    --lock-path=/run/nginx.lock \
    --with-threads \
    --with-file-aio \
    --with-http_ssl_module \
    --with-http_v2_module \
    --with-http_realip_module \
    --with-http_gzip_static_module \
    --with-http_stub_status_module \
    --with-stream \
    --with-stream_ssl_module \
    --with-stream_realip_module
make -j$(nproc)
make install

Install Keepalived

# Rocky Linux 9
dnf install -y keepalived
keepalived --version   # should show v2.3.1

# If repository version is older, compile from source
cd /usr/local/src
wget https://www.keepalived.org/software/keepalived-2.3.1.tar.gz
tar xzf keepalived-2.3.1.tar.gz
cd keepalived-2.3.1
./configure --prefix=/usr/local/keepalived
make -j$(nproc)
make install

# Create symlinks for convenience
ln -sf /usr/local/keepalived/sbin/keepalived /usr/sbin/keepalived
ln -sf /usr/local/keepalived/etc/keepalived /etc/keepalived

Core Configuration

Nginx (identical on both nodes)

# /etc/nginx/nginx.conf (excerpt)
user nginx;
worker_processes auto;
worker_cpu_affinity auto;
error_log /var/log/nginx/error.log warn;
pid /run/nginx.pid;

worker_rlimit_nofile 65535;

events {
    use epoll;
    worker_connections 65535;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
    access_log /var/log/nginx/access.log main buffer=16k flush=5s;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    gzip on;
    gzip_min_length 1k;
    gzip_comp_level 4;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
    gzip_vary on;

    upstream backend_servers {
        least_conn;
        keepalive 100;
        server 192.168.1.21:8080 weight=100 max_fails=3 fail_timeout=10s;
        server 192.168.1.22:8080 weight=100 max_fails=3 fail_timeout=10s;
        server 192.168.1.23:8080 weight=100 max_fails=3 fail_timeout=10s;
        server 192.168.1.24:8080 weight=100 max_fails=3 fail_timeout=10s;
    }

    server {
        listen 80 default_server;
        listen 443 ssl default_server;
        server_name _;
        ssl_certificate /etc/nginx/certs/default.crt;
        ssl_certificate_key /etc/nginx/certs/default.key;
        return 444;
    }

    server {
        listen 127.0.0.1:10080;
        location /nginx_status { stub_status on; allow 127.0.0.1; deny all; }
        location /health { return 200 "OK
"; add_header Content-Type text/plain; }
    }

    server {
        listen 80;
        listen 443 ssl;
        server_name www.example.com example.com;
        ssl_certificate /etc/nginx/certs/example.com.crt;
        ssl_certificate_key /etc/nginx/certs/example.com.key;
        add_header Strict-Transport-Security "max-age=31536000" always;
        # ... static, API, WebSocket, and default locations omitted for brevity ...
    }
}

Keepalived (Node A)

# /etc/keepalived/keepalived.conf (NodeA)
global_defs {
    router_id NGINX_HA_NODE_A
    script_user root
    enable_script_security
    notification_email { [email protected] }
    notification_email_from keepalived@nginx-node
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
}

vrrp_script check_nginx {
    script "/etc/keepalived/scripts/check_nginx.sh"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication { auth_type PASS; auth_pass K33pAl1v3d_VIP1; }
    virtual_ipaddress { 192.168.1.100/24 dev eth0 label eth0:vip1; }
    track_script { check_nginx }
    notify_master "/etc/keepalived/scripts/notify.sh master VI_1"
    notify_backup "/etc/keepalived/scripts/notify.sh backup VI_1"
    notify_fault  "/etc/keepalived/scripts/notify.sh fault VI_1"
}

vrrp_instance VI_2 {
    state BACKUP
    interface eth0
    virtual_router_id 52
    priority 100
    advert_int 1
    authentication { auth_type PASS; auth_pass K33pAl1v3d_VIP2; }
    virtual_ipaddress { 192.168.1.101/24 dev eth0 label eth0:vip2; }
    track_script { check_nginx }
    notify_master "/etc/keepalived/scripts/notify.sh master VI_2"
    notify_backup "/etc/keepalived/scripts/notify.sh backup VI_2"
    notify_fault  "/etc/keepalived/scripts/notify.sh fault VI_2"
}

Keepalived (Node B)

# /etc/keepalived/keepalived.conf (NodeB)
global_defs {
    router_id NGINX_HA_NODE_B
    script_user root
    enable_script_security
    notification_email { [email protected] }
    notification_email_from keepalived@nginx-node-b
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
}

vrrp_script check_nginx {
    script "/etc/keepalived/scripts/check_nginx.sh"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication { auth_type PASS; auth_pass K33pAl1v3d_VIP1; }
    virtual_ipaddress { 192.168.1.100/24 dev eth0 label eth0:vip1; }
    track_script { check_nginx }
    notify_master "/etc/keepalived/scripts/notify.sh master VI_1"
    notify_backup "/etc/keepalived/scripts/notify.sh backup VI_1"
    notify_fault  "/etc/keepalived/scripts/notify.sh fault VI_1"
}

vrrp_instance VI_2 {
    state MASTER
    interface eth0
    virtual_router_id 52
    priority 150
    advert_int 1
    authentication { auth_type PASS; auth_pass K33pAl1v3d_VIP2; }
    virtual_ipaddress { 192.168.1.101/24 dev eth0 label eth0:vip2; }
    track_script { check_nginx }
    notify_master "/etc/keepalived/scripts/notify.sh master VI_2"
    notify_backup "/etc/keepalived/scripts/notify.sh backup VI_2"
    notify_fault  "/etc/keepalived/scripts/notify.sh fault VI_2"
}

Health‑Check Script (check_nginx.sh)

#!/bin/bash
# Check Nginx process
if ! pidof nginx > /dev/null; then
    echo "Nginx process not found"
    exit 1
fi
# Check port 80 listening
if ! ss -tlnp | grep -q ':80 '; then
    echo "Port 80 not listening"
    exit 1
fi
# HTTP health endpoint
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 2 --max-time 5 http://127.0.0.1:10080/health 2>/dev/null)
if [ "$HTTP_CODE" != "200" ]; then
    echo "Health check returned $HTTP_CODE"
    exit 1
fi
exit 0

State‑Change Notification Script (notify.sh)

#!/bin/bash
STATE=$1   # master/backup/fault
VRRP_INSTANCE=$2
HOSTNAME=$(hostname)
DATETIME=$(date '+%Y-%m-%d %H:%M:%S')
LOG_FILE="/var/log/keepalived-notify.log"
log_message() { echo "[$DATETIME] $1" >> $LOG_FILE; }
case "$STATE" in
    master)  log_message "[$VRRP_INSTANCE] Transition to MASTER on $HOSTNAME";;
    backup)  log_message "[$VRRP_INSTANCE] Transition to BACKUP on $HOSTNAME";;
    fault)   log_message "[$VRRP_INSTANCE] Transition to FAULT on $HOSTNAME";;
    *)       log_message "[$VRRP_INSTANCE] Unknown state: $STATE";;
esac

Start and Verify

Enable and start Nginx on both nodes: systemctl enable nginx && systemctl start nginx Enable and start Keepalived: systemctl enable keepalived && systemctl start keepalived Verify VIP binding:

# NodeA
ip addr show eth0 | grep -E "inet.*vip"
# Should show 192.168.1.100

# NodeB
ip addr show eth0 | grep -E "inet.*vip"
# Should show 192.168.1.101

Test service availability:

curl -I http://192.168.1.100/health
curl -I http://192.168.1.101/health

Simulate failure (e.g., systemctl stop nginx on Node A) and confirm VIPs migrate to Node B. Restore Nginx and verify VIPs return if preempt mode is desired.

Best Practices & Pitfalls

Key Recommendations

Health checks must validate the business layer, not just the Nginx process.

Set the priority difference larger than the absolute value of the health‑check weight to avoid split‑brain. Example: weight = -20, priority A = 150, priority B = 100 → diff 50 > 20.

Use unicast instead of multicast if the network blocks VRRP multicast traffic.

Separate Keepalived logs (e.g., via rsyslog) to simplify troubleshooting.

Common Issues

Split‑brain : both nodes hold the same VIP – caused by network loss. Fix by checking connectivity, using unicast, or adding extra heartbeat links.

VIP does not migrate : firewall blocks VRRP or mismatched virtual_router_id. Ensure VRRP protocol is allowed and IDs match.

Frequent VIP flapping : priority difference too small. Increase the gap or raise fall / rise thresholds.

Service unavailable after failover : ARP cache or conntrack entries stale; clear conntrack or send gratuitous ARP in the notify_master script.

Health‑check false positives : script timeout too short; increase timeout or add backend checks.

Monitoring & Alerting

Nginx Metrics

Expose stub_status at /nginx_status and collect active connections, reading, writing, waiting, and total requests.

Keepalived Metrics

Export custom metrics (e.g., VIP master status) via a Prometheus exporter script that reads ip addr and Keepalived logs.

Sample Prometheus Exporter (nginx‑keepalived‑monitor.sh)

#!/bin/bash
NGINX_STATUS=$(curl -s http://127.0.0.1:10080/nginx_status)
ACTIVE=$(echo "$NGINX_STATUS" | grep 'Active' | awk '{print $3}')
READING=$(echo "$NGINX_STATUS" | grep 'Reading' | awk '{print $2}')
WRITING=$(echo "$NGINX_STATUS" | grep 'Writing' | awk '{print $4}')
VIP1_STATUS=backup
VIP2_STATUS=backup
if ip addr show eth0 | grep -q "192.168.1.100"; then VIP1_STATUS=master; fi
if ip addr show eth0 | grep -q "192.168.1.101"; then VIP2_STATUS=master; fi

echo "# HELP nginx_connections_active Active connections"
echo "# TYPE nginx_connections_active gauge"
echo "nginx_connections_active $ACTIVE"

echo "# HELP keepalived_vip1_is_master VIP1 master status"
echo "# TYPE keepalived_vip1_is_master gauge"
if [ "$VIP1_STATUS" = "master" ]; then echo "keepalived_vip1_is_master 1"; else echo "keepalived_vip1_is_master 0"; fi

echo "# HELP keepalived_vip2_is_master VIP2 master status"
echo "# TYPE keepalived_vip2_is_master gauge"
if [ "$VIP2_STATUS" = "master" ]; then echo "keepalived_vip2_is_master 1"; else echo "keepalived_vip2_is_master 0"; fi

Backup & Recovery

Use a simple tar‑based script to archive /etc/nginx, /etc/keepalived, custom sysctl files, and certificates. Retain backups for 30 days and restore by extracting the archives, re‑applying kernel parameters, and restarting services.

Conclusion

The dual‑master Nginx + Keepalived design provides true active‑active load balancing, eliminates idle backup resources, and ensures rapid failover with minimal service disruption. Proper health‑check scripting, priority tuning, and network configuration are essential to avoid split‑brain and VIP flapping. The pattern scales from single‑room deployments to cloud‑native environments when combined with unicast VRRP or external SLB integration.

References

Nginx official documentation: https://nginx.org/en/docs/

Keepalived official documentation: https://www.keepalived.org/manpage.html

Keepalived source repository: https://github.com/acassen/keepalived

Linux Virtual Server project: http://www.linuxvirtualserver.org/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

High Availabilityload balancingLinuxnginxkeepalived
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.