Operations 46 min read

How to Build Keepalived High‑Availability Nginx with VRRP and Zabbix Monitoring

This guide explains the fundamentals of Keepalived, its key features, VRRP‑based high‑availability architecture, step‑by‑step installation and configuration of Keepalived and Nginx on master and backup servers, scripts for health checks, split‑brain causes and solutions, and Zabbix monitoring to detect and alert on HA failures.

MaGe Linux Operations

Nov 16, 2024

How to Build Keepalived High‑Availability Nginx with VRRP and Zabbix Monitoring

Keepalived Overview

Keepalived was originally designed for LVS load‑balancing management and later added VRRP support to provide high availability. It can manage LVS as well as other services such as Nginx, HAProxy, and MySQL.

Key Functions

Keepalived offers three main capabilities: managing LVS load‑balancing software, performing health checks on LVS cluster nodes, and providing failover for system network services.

High‑Availability Architecture

VRRP Working Principle

Keepalived uses the VRRP protocol to achieve high availability. VRRP (Virtual Router Redundancy Protocol) solves the single‑point‑failure problem of static routing by electing a master router that periodically sends multicast advertisements. Backup routers listen for these advertisements; if they stop receiving them, a backup takes over the virtual IP (VIP) within about one second.

Implementation with Nginx

The following steps set up a two‑node HA cluster where Keepalived monitors Nginx and ensures continuous service.

Installation Environment

Both nodes run CentOS 8.5. The master node is named master (IP 192.168.222.138) and the backup node is named backup (IP 192.168.222.139). A virtual IP (VIP) 192.168.222.133 will be shared between them.

Master Node Setup

# Disable firewall and SELinux
systemctl stop firewalld.service
vim /etc/selinux/config   # set SELINUX=disabled
setenforce 0
systemctl disable --now firewalld.service

# Configure yum repositories
dnf -y install wget
cd /etc/yum.repos.d/
wget -O CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' CentOS-Base.repo

dnf -y install epel-release
sed -i 's|^#baseurl=.*|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*

# Install Keepalived
dnf -y install keepalived

# Install Nginx

dnf -y install nginx
cd /usr/share/nginx/html/
echo 'master' > index.html
systemctl start nginx
systemctl enable nginx

Backup Node Setup

# Repeat the same steps on the backup node (replace hostnames and IPs accordingly)
# Disable firewall and SELinux, configure repos, install Keepalived and Nginx
# Set a simple index page
cd /usr/share/nginx/html/
echo 'backup' > index.html
systemctl start nginx
systemctl enable nginx

Keepalived Configuration (Master)

! Configuration File for keepalived

global_defs {
    router_id lb01
}

vrrp_script nginx_check {
    script "/scripts/check_nginx.sh"
    interval 5
    weight -20
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass tushanbu
    }
    virtual_ipaddress {
        192.168.222.133
    }
    track_script {
        nginx_check
    }
    notify_master "/scripts/notify.sh master 192.168.222.133"
    notify_backup "/scripts/notify.sh backup 192.168.222.133"
}

virtual_server 192.168.222.133 80 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 50
    protocol TCP
    real_server 192.168.222.138 80 {
        weight 1
        TCP_CHECK {
            connect_port 80
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
    real_server 192.168.222.139 80 {
        weight 1
        TCP_CHECK {
            connect_port 80
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

Keepalived Configuration (Backup)

! Configuration File for keepalived

global_defs {
    router_id lb02
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass tushanbu
    }
    virtual_ipaddress {
        192.168.222.133
    }
    notify_master "/scripts/notify.sh master 192.168.222.133"
    notify_backup "/scripts/notify.sh backup 192.168.222.133"
}

virtual_server 192.168.222.133 80 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 50
    protocol TCP
    real_server 192.168.222.138 80 {
        weight 1
        TCP_CHECK {
            connect_port 80
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
    real_server 192.168.222.139 80 {
        weight 1
        TCP_CHECK {
            connect_port 80
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

VRRP Communication and Failover

When the master node stops its Keepalived service, the backup node receives no VRRP advertisements, assumes the VIP, and starts Nginx. When the master service is restarted, it re‑acquires the VIP and takes back the role.

Monitoring Nginx with Keepalived Scripts

Two scripts are used on the master node:

# /scripts/check_nginx.sh
#!/bin/bash
nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
if [ $nginx_status -lt 1 ]; then
    systemctl stop keepalived
fi

# /scripts/notify.sh
#!/bin/bash
case "$1" in
    master)
        nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
        if [ $nginx_status -lt 1 ]; then
            systemctl start nginx
        fi
        ;;
    backup)
        nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
        if [ $nginx_status -gt 0 ]; then
            systemctl stop nginx
        fi
        ;;
    *)
        echo "Usage: $0 master|backup VIP"
        ;;
esac

The backup node only needs the notify scripts to start Nginx when it becomes master and stop it when it becomes backup.

Split‑Brain (Brain Split) in HA Systems

When the heartbeat link between two HA nodes fails, each node believes the other is down and both may acquire the shared resources, leading to data corruption or service disruption. This situation is called a split‑brain.

Causes of Split‑Brain

Failure of the heartbeat network (cable break, NIC failure, switch failure, etc.).

IPTables or firewall blocking VRRP packets.

Incorrect VRRP configuration (e.g., mismatched virtual_router_id).

Software bugs or mismatched heartbeat methods.

Common Solutions

Use redundant heartbeat links (dual cables) to reduce the chance of failure.

Employ disk locking mechanisms that only activate when the heartbeat is completely lost.

Configure a quorum or arbitration IP; nodes that cannot ping the quorum IP relinquish the VIP.

Monitor split‑brain conditions and alert operators for manual intervention.

Monitoring Split‑Brain with Zabbix

A custom Zabbix agent script checks whether the backup node still holds the VIP. If the VIP is present on the backup, the script returns 1 (error); otherwise it returns 0 (OK).

# /scripts/check_keepalived.sh
#!/bin/bash
if [ `ip a show ens33 | grep 192.168.222.133 | wc -l` -ne 0 ]; then
    echo "1"
else
    echo "0"
fi

The Zabbix agent configuration on the backup node includes:

UserParameter=check_keepalived,/bin/bash /scripts/check_keepalived.sh
UnsafeUserParameters=1

After restarting the Zabbix agent, the master server can query the item check_keepalived and trigger an alarm when the value is 1.

Testing Procedure

Stop Nginx on the master and restart Keepalived. The backup node should acquire the VIP and start Nginx.

Restart Nginx and Keepalived on the master; the master should regain the VIP and Nginx.

Introduce a split‑brain by changing virtual_router_id on the master (e.g., to 55) and restarting Keepalived. Both nodes will now hold the VIP, and Zabbix will raise an alarm.

Fix the configuration, restart Keepalived, and verify that the alarm clears.

Conclusion

This tutorial demonstrates how to deploy Keepalived for Nginx high availability, implement health‑check scripts, understand split‑brain scenarios, and use Zabbix to monitor and alert on HA failures, ensuring reliable service continuity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load balancing VRRP Zabbix Keepalived Split-Brain

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.