How to Build Keepalived High‑Availability Nginx with VRRP and Zabbix Monitoring
This guide explains the fundamentals of Keepalived, its key features, VRRP‑based high‑availability architecture, step‑by‑step installation and configuration of Keepalived and Nginx on master and backup servers, scripts for health checks, split‑brain causes and solutions, and Zabbix monitoring to detect and alert on HA failures.
Keepalived Overview
Keepalived was originally designed for LVS load‑balancing management and later added VRRP support to provide high availability. It can manage LVS as well as other services such as Nginx, HAProxy, and MySQL.
Key Functions
Keepalived offers three main capabilities: managing LVS load‑balancing software, performing health checks on LVS cluster nodes, and providing failover for system network services.
High‑Availability Architecture
VRRP Working Principle
Keepalived uses the VRRP protocol to achieve high availability. VRRP (Virtual Router Redundancy Protocol) solves the single‑point‑failure problem of static routing by electing a master router that periodically sends multicast advertisements. Backup routers listen for these advertisements; if they stop receiving them, a backup takes over the virtual IP (VIP) within about one second.
Implementation with Nginx
The following steps set up a two‑node HA cluster where Keepalived monitors Nginx and ensures continuous service.
Installation Environment
Both nodes run CentOS 8.5. The master node is named master (IP 192.168.222.138) and the backup node is named backup (IP 192.168.222.139). A virtual IP (VIP) 192.168.222.133 will be shared between them.
Master Node Setup
# Disable firewall and SELinux
systemctl stop firewalld.service
vim /etc/selinux/config # set SELINUX=disabled
setenforce 0
systemctl disable --now firewalld.service
# Configure yum repositories
dnf -y install wget
cd /etc/yum.repos.d/
wget -O CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' CentOS-Base.repo
dnf -y install epel-release
sed -i 's|^#baseurl=.*|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*
# Install Keepalived
dnf -y install keepalived
# Install Nginx
dnf -y install nginx
cd /usr/share/nginx/html/
echo 'master' > index.html
systemctl start nginx
systemctl enable nginxBackup Node Setup
# Repeat the same steps on the backup node (replace hostnames and IPs accordingly)
# Disable firewall and SELinux, configure repos, install Keepalived and Nginx
# Set a simple index page
cd /usr/share/nginx/html/
echo 'backup' > index.html
systemctl start nginx
systemctl enable nginxKeepalived Configuration (Master)
! Configuration File for keepalived
global_defs {
router_id lb01
}
vrrp_script nginx_check {
script "/scripts/check_nginx.sh"
interval 5
weight -20
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass tushanbu
}
virtual_ipaddress {
192.168.222.133
}
track_script {
nginx_check
}
notify_master "/scripts/notify.sh master 192.168.222.133"
notify_backup "/scripts/notify.sh backup 192.168.222.133"
}
virtual_server 192.168.222.133 80 {
delay_loop 6
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 192.168.222.138 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.222.139 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}Keepalived Configuration (Backup)
! Configuration File for keepalived
global_defs {
router_id lb02
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass tushanbu
}
virtual_ipaddress {
192.168.222.133
}
notify_master "/scripts/notify.sh master 192.168.222.133"
notify_backup "/scripts/notify.sh backup 192.168.222.133"
}
virtual_server 192.168.222.133 80 {
delay_loop 6
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 192.168.222.138 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.222.139 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}VRRP Communication and Failover
When the master node stops its Keepalived service, the backup node receives no VRRP advertisements, assumes the VIP, and starts Nginx. When the master service is restarted, it re‑acquires the VIP and takes back the role.
Monitoring Nginx with Keepalived Scripts
Two scripts are used on the master node:
# /scripts/check_nginx.sh
#!/bin/bash
nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
if [ $nginx_status -lt 1 ]; then
systemctl stop keepalived
fi # /scripts/notify.sh
#!/bin/bash
case "$1" in
master)
nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
if [ $nginx_status -lt 1 ]; then
systemctl start nginx
fi
;;
backup)
nginx_status=$(ps -ef | grep -Ev "grep|$0" | grep '\bnginx\b' | wc -l)
if [ $nginx_status -gt 0 ]; then
systemctl stop nginx
fi
;;
*)
echo "Usage: $0 master|backup VIP"
;;
esacThe backup node only needs the notify scripts to start Nginx when it becomes master and stop it when it becomes backup.
Split‑Brain (Brain Split) in HA Systems
When the heartbeat link between two HA nodes fails, each node believes the other is down and both may acquire the shared resources, leading to data corruption or service disruption. This situation is called a split‑brain.
Causes of Split‑Brain
Failure of the heartbeat network (cable break, NIC failure, switch failure, etc.).
IPTables or firewall blocking VRRP packets.
Incorrect VRRP configuration (e.g., mismatched virtual_router_id).
Software bugs or mismatched heartbeat methods.
Common Solutions
Use redundant heartbeat links (dual cables) to reduce the chance of failure.
Employ disk locking mechanisms that only activate when the heartbeat is completely lost.
Configure a quorum or arbitration IP; nodes that cannot ping the quorum IP relinquish the VIP.
Monitor split‑brain conditions and alert operators for manual intervention.
Monitoring Split‑Brain with Zabbix
A custom Zabbix agent script checks whether the backup node still holds the VIP. If the VIP is present on the backup, the script returns 1 (error); otherwise it returns 0 (OK).
# /scripts/check_keepalived.sh
#!/bin/bash
if [ `ip a show ens33 | grep 192.168.222.133 | wc -l` -ne 0 ]; then
echo "1"
else
echo "0"
fiThe Zabbix agent configuration on the backup node includes:
UserParameter=check_keepalived,/bin/bash /scripts/check_keepalived.sh
UnsafeUserParameters=1After restarting the Zabbix agent, the master server can query the item check_keepalived and trigger an alarm when the value is 1.
Testing Procedure
Stop Nginx on the master and restart Keepalived. The backup node should acquire the VIP and start Nginx.
Restart Nginx and Keepalived on the master; the master should regain the VIP and Nginx.
Introduce a split‑brain by changing virtual_router_id on the master (e.g., to 55) and restarting Keepalived. Both nodes will now hold the VIP, and Zabbix will raise an alarm.
Fix the configuration, restart Keepalived, and verify that the alarm clears.
Conclusion
This tutorial demonstrates how to deploy Keepalived for Nginx high availability, implement health‑check scripts, understand split‑brain scenarios, and use Zabbix to monitor and alert on HA failures, ensuring reliable service continuity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
