Operations 29 min read

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

This article walks through building a simple, cost‑effective high‑availability solution for Nginx using Keepalived’s VRRP‑based VIP failover, covering environment setup, configuration of master and backup nodes, health‑check scripts, testing procedures, troubleshooting tips, and rollback steps.

Ops Community
Ops Community
Ops Community
Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

Background and Use Cases

In small‑to‑medium deployments a single Nginx instance often serves static assets, reverse‑proxies traffic, and acts as the entry point to backend services. When the host crashes or the Nginx process exits, all dependent services are interrupted. Pairing Keepalived with Nginx provides a low‑cost HA solution: Keepalived implements VRRP to move a Virtual IP (VIP) between nodes. The master holds the VIP and broadcasts its status; the backup listens and takes over the VIP when the master disappears. Failover is transparent to clients and typically completes within 3‑10 seconds.

Architecture Design

┌──────────────────────────┐
               │      Upstream Clients    │
               │   (or other load balancer)│
               └────────────┬───────────────┘
                            │  Request to VIP:192.168.1.100
               ┌────────────┴───────────────┐
               │      Switch / Router       │
               └────────────┬───────────────┘
                            │
        ┌──────────────────┴──────────────────┐
        │                VIP                  │
        │            192.168.1.100           │
        │   (bound dynamically by Keepalived)│
        └──────────────────┬──────────────────┘
                            │
      ┌───────────────────────┼───────────────────────┐
      │                       │                       │
 ┌────┴───────┐         ┌─────┴───────┐         ┌─────┴───────┐
 │ Nginx‑01   │         │ Nginx‑02   │         │   …         │
 │ (Master)   │         │ (Backup)   │         │             │
 │192.168.1.11│         │192.168.1.12│         │             │
 └────┬───────┘         └─────┬───────┘         └─────────────┘
      │                       │
      └───────────────────────┼───────────────────────┘
                              │
               ┌────────────┴───────────────┐
               │   Backend Real Servers        │
               │   (Web / App / Database)      │
               └──────────────────────────────┘

Key points:

Each server has its own physical IP (192.168.1.11 and 192.168.1.12).

Both run a Keepalived instance; the master declares ownership of VIP 192.168.1.100.

Clients only see the VIP and do not know which Nginx handles the request.

Nginx listens on all interfaces, but only the node holding the VIP receives traffic.

Backend services can be any set of web servers, upstream definitions, or other services.

Environment Preparation

Hardware and OS

nginx‑master – hostname nginx-master, IP 192.168.1.11, OS CentOS 7.9 or Ubuntu 20.04

nginx‑backup – hostname nginx-backup, IP 192.168.1.12, OS CentOS 7.9 or Ubuntu 20.04

VIP – 192.168.1.100 (must be in the same /24 subnet as the physical IPs)

Backend test host – hostname web-test, IP 192.168.1.21, any OS

Ensure the two Nginx servers have similar configurations and that the VIP resides in the same subnet as the physical IPs.

Install Nginx

CentOS 7:

# Install EPEL repository
yum install -y epel-release
# Install Nginx
yum install -y nginx
# Enable and start
systemctl start nginx
systemctl enable nginx
# Verify version
nginx -v

Ubuntu 20.04:

apt update
apt install -y nginx
systemctl start nginx
systemctl enable nginx
nginx -v

Install Keepalived

CentOS 7:

yum install -y keepalived
keepalived --version   # example shows 1.3.5

Ubuntu 20.04:

apt update
apt install -y keepalived
keepalived --version   # Ubuntu ships 2.0.19 (configuration differs)
Version note: Keepalived 2.0 introduced a new configuration syntax ( vrrp_instances ). This guide focuses on the 1.3.x format; differences for 2.0 are highlighted where relevant.

Keepalived Configuration Details

Master Node (nginx‑master)

Configuration file:

/etc/keepalived/keepalived.conf
# /etc/keepalived/keepalived.conf
global_defs {
    router_id nginx_keepalived
    vrrp_garp_interval 1
    vrrp_gna_interval 1
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    fall 2
    rise 1
    weight -10
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234abcd
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    track_script {
        check_nginx
    }
    notify_master "/etc/keepalived/notify.sh master"
    notify_backup "/etc/keepalived/notify.sh backup"
    notify_fault "/etc/keepalived/notify.sh fault"
}
Interface name tip: Use ip a to confirm the exact interface name (e.g., eth0, eno1, ens33, enp0s3). A wrong name prevents VIP binding.

Backup Node (nginx‑backup)

The backup configuration mirrors the master with two differences: state BACKUP and a lower priority (e.g., 80).

# /etc/keepalived/keepalived.conf (backup)
global_defs {
    router_id nginx_keepalived
    vrrp_garp_interval 1
    vrrp_gna_interval 1
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    fall 2
    rise 1
    weight -10
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 80   # lower than master
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234abcd
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    track_script {
        check_nginx
    }
    notify_master "/etc/keepalived/notify.sh master"
    notify_backup "/etc/keepalived/notify.sh backup"
    notify_fault "/etc/keepalived/notify.sh fault"
}
State parameter: state MASTER or BACKUP only sets the initial state. The actual holder of the VIP is decided by priority combined with the health‑check script result. To enforce strict master‑backup behavior, add nopreempt to the instance.

Non‑Preempt Mode (optional)

Adding nopreempt prevents the master from reclaiming the VIP after it recovers, which can reduce unnecessary failovers:

vrrp_instance VI_1 {
    state BACKUP
    priority 100
    nopreempt
    ...
}

Nginx Health‑Check Scripts

Basic Script ( check_nginx.sh )

#!/bin/bash
# /etc/keepalived/check_nginx.sh
# Exit non‑zero if Nginx is not running or does not respond.
nginx_process=$(ps -ef | grep nginx | grep -v grep | wc -l)
if [ $nginx_process -eq 0 ]; then
    /usr/bin/logger "Keepalived check: Nginx process not found, killing keepalived"
    exit 1
fi
response=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/ --connect-timeout 2 --max-time 3 2>/dev/null)
if [ "$response" != "200" ] && [ "$response" != "301" ] && [ "$response" != "302" ] && [ "$response" != "403" ] && [ "$response" != "404" ]; then
    /usr/bin/logger "Keepalived check: Nginx not responding, HTTP code: $response"
    exit 1
fi
exit 0

Make it executable:

chmod +x /etc/keepalived/check_nginx.sh

Advanced Script ( check_nginx_proxy.sh )

When Nginx proxies to upstream services, also verify upstream health:

#!/bin/bash
# /etc/keepalived/check_nginx_proxy.sh
nginx_process=$(ps -ef | grep nginx | grep -v grep | wc -l)
if [ $nginx_process -eq 0 ]; then
    /usr/bin/logger "Keepalived: Nginx process not found"
    exit 1
fi
upstream_check=$(curl -s -o /dev/null -w "%{http_code}" http://192.168.1.21:8080/health --connect-timeout 2 --max-time 3 2>/dev/null)
if [ "$upstream_check" != "200" ]; then
    /usr/bin/logger "Keepalived: Upstream is unhealthy, code: $upstream_check"
    exit 1
fi
exit 0

Notify Script ( notify.sh )

#!/bin/bash
# /etc/keepalived/notify.sh
# Called with one argument: master|backup|fault
LOGFILE="/var/log/keepalived-notify.log"
case "$1" in
    master)
        echo "$(date '+%Y-%m-%d %H:%M:%S') [MASTER] Keepalived transitioned to MASTER state" >> $LOGFILE
        ;;
    backup)
        echo "$(date '+%Y-%m-%d %H:%M:%S') [BACKUP] Keepalived transitioned to BACKUP state" >> $LOGFILE
        ;;
    fault)
        echo "$(date '+%Y-%m-%d %H:%M:%S') [FAULT] Keepalived entered FAULT state" >> $LOGFILE
        ;;
    *)
        echo "Unknown state: $1" >> $LOGFILE
        ;;
esac

Make it executable:

chmod +x /etc/keepalived/notify.sh

Nginx Configuration

Basic Reverse‑Proxy Setup

Assume a backend web service at 192.168.1.21:8080.

# /etc/nginx/conf.d/upstream.conf
upstream backend_servers {
    server 192.168.1.21:8080 max_fails=3 fail_timeout=30s;
    # Add more servers as needed
}
# /etc/nginx/conf.d/proxy.conf
server {
    listen 80;
    listen 443 ssl;
    server_name localhost;
    # SSL directives omitted for brevity
    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        proxy_buffering on;
        proxy_buffer_size 4k;
        proxy_buffers 8 4k;
    }
    location /nginx_health {
        access_log off;
        return 200 "nginx is healthy
";
        add_header Content-Type text/plain;
    }
}

Test syntax and reload:

nginx -t
systemctl reload nginx

Static‑File Service (optional)

# /etc/nginx/conf.d/static.conf
server {
    listen 80;
    listen 443 ssl;
    server_name static.example.com;
    root /usr/share/nginx/html;
    index index.html;
    location ~* \.(jpg|jpeg|png|gif|ico|css|js|woff|woff2)$ {
        expires 7d;
        add_header Cache-Control "public, immutable";
        access_log off;
    }
    access_log /var/log/nginx/static_access.log main;
    error_log /var/log/nginx/static_error.log warn;
}

Startup and Verification

Startup Order

1. Start backend services first.
2. Start Nginx on both servers.
3. Start Keepalived (master first, then backup).

Verification Steps

On the master, confirm the VIP is bound:

ip addr show eth0 | grep 192.168.1.100
# Expected: inet 192.168.1.100/24 scope global secondary eth0

On the backup, ensure the VIP is absent.

From a client, request the VIP and verify the response comes from the master:

curl -I http://192.168.1.100/
# Repeat to see the same remote address.

Check Keepalived logs for state transitions:

# CentOS
tail -f /var/log/messages | grep keepalived
# Ubuntu
tail -f /var/log/syslog | grep keepalived

Failover Tests

Simulate Nginx crash (keep Keepalived running)

# On master
ip addr show eth0 | grep 192.168.1.100   # verify VIP present
pkill nginx   # or systemctl stop nginx
# Observe:
#   - VIP disappears from master
#   - Backup acquires VIP
#   - Client request recovers after 3‑10 s

Simulate master host shutdown

# On master
shutdown -h now
# After ~3 s the backup should take the VIP.

Simulate Keepalived crash (Nginx stays up)

# On master
pkill keepalived
# Backup takes VIP after the advert interval.

These scenarios demonstrate that Keepalived only moves the VIP; Nginx health is managed by the custom scripts.

Troubleshooting

Problem 1 – VIP cannot bind ("bind() cannot assign requested address")

Typical causes:

Duplicate virtual_router_id in the same subnet.

Incorrect interface name.

IP conflict with another host.

OS restrictions on ARP (rare).

Diagnostic commands:

# Verify interface name
ip a
# Check for IP conflict
arping -I eth0 192.168.1.100
# Inspect other Keepalived instances
tcpdump -i eth0 vrrp -n
# Manual test bind
ip addr add 192.168.1.100/24 dev eth0
ip addr del 192.168.1.100/24 dev eth0

Problem 2 – Backup stays in BACKUP and never takes VIP

Common reasons:

Mismatched virtual_router_id between nodes.

Different advert_int values.

Network/firewall blocks VRRP (IP protocol 112).

Backup priority set to 0 or syntax error.

Problem 3 – Health‑check script always reports failure

Run the script manually and check the exit code. Verify curl works locally and that the script has execute permission.

Problem 4 – After failover, Nginx logs show many retries

This is expected: client requests that timed out during the 3‑10 s window are retried against the new master. Ensure the backend API is idempotent if writes are involved.

Problem 5 – SSL certificate errors after VIP moves

Make sure the SSL certificate and related configuration are identical on both Nginx nodes, or use a wildcard certificate that covers the domain.

Risk Considerations

Split‑brain: If a network partition isolates the nodes, both may claim the VIP. Mitigate by configuring unicast VRRP ( unicast_src_ip + unicast_peer) and tightening firewall rules.

Service interruption: Default failover takes 3‑10 seconds. Clients should implement retry logic for higher availability.

Configuration drift: Keep Nginx configs synchronized (e.g., rsync, Ansible, Git‑based deployment) to avoid mismatched upstreams or SSL settings.

VIP subnet limitation: VIP must be in the same L2 network as the physical IPs; cross‑subnet HA requires LVS DR mode or tunneling.

Backend single point: HA only protects Nginx; if the upstream service is a single instance, the overall system still has a single point of failure.

Rollback Procedure

Stop Keepalived on the problematic node:

systemctl stop keepalived
ip addr show eth0 | grep 192.168.1.100   # VIP should disappear

Run Nginx in standalone mode on the remaining node:

systemctl start nginx
systemctl enable nginx

Restore any altered network or firewall settings (iptables, firewalld).

Validate service availability via both physical IPs and the VIP:

curl -I http://192.168.1.11/
curl -I http://192.168.1.12/
curl -I http://192.168.1.100/

Conclusion

Keepalived + Nginx provides a clear, low‑cost HA pattern: VRRP moves a VIP between two Nginx instances, while custom health‑check scripts ensure that only a healthy Nginx holds the address. The deployment steps are straightforward, but careful attention to interface names, priority values, and script exit codes is essential. Mastering this pattern builds a solid foundation for more advanced HA solutions such as LVS, hardware load balancers, or Kubernetes Service load‑balancing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityLinuxNGINXfailoverviphealth checkkeepalivedvrp
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.