Zero-Downtime Failover Explained: The Core Secrets of Keepalived
This comprehensive guide explains Keepalived's VRRP‑based high‑availability architecture, core functions, deployment scenarios, configuration details, troubleshooting steps, performance tuning, and best‑practice recommendations for building reliable Linux load‑balancing and failover solutions.
Zero-failure switch! One article to understand Keepalived's high‑availability core secrets
1. Keepalived software introduction
1.1 What is Keepalived
Keepalived is a high‑availability solution based on the VRRP (Virtual Router Redundancy Protocol) protocol, mainly used for load balancing and HA clusters on Linux. It was originally developed for the LVS (Linux Virtual Server) project but has evolved into an independent HA solution.
1.2 Core functions of Keepalived
High Availability :
Master‑backup switching via VRRP
Virtual IP (VIP) failover
Multiple health‑check mechanisms
Automatic fault detection and recovery
Load Balancing :
Integration with LVS
Supports various scheduling algorithms
Backend server health checks
Dynamic load‑balancing configuration adjustments
Service Monitoring :
Supports TCP, HTTP, SSL and custom script checks
Weight dynamic adjustment
Automatic fault isolation
1.3 Application scenarios
Web service HA :
Nginx/Apache load balancer HA
Web application server failover
Database connection HA
Network device HA :
Router failover
Firewall HA deployment
Gateway redundancy
Service cluster :
MySQL master‑slave switch
Redis cluster HA
Application service cluster management
2. Keepalived architecture and principles
2.1 Components
Keepalived mainly consists of:
Keepalived Daemon : main process for coordination and management
VRRP Stack : implements the VRRP protocol
Checkers : health‑check modules monitoring backend services
IPVS Wrapper : LVS integration providing load‑balancing
Configuration file structure :
/etc/keepalived/
├── keepalived.conf # main config file
├── notify_scripts/ # notification scripts
└── check_scripts/ # health‑check scripts2.2 Working principle
VRRP mechanism :
Virtual router group: multiple physical routers form a virtual router group
Priority mechanism: each device has a priority; the highest becomes Master
Heartbeat detection: periodic VRRP advertisement packets
Failover: Backup automatically takes over when Master fails
State transition process :
Initialize → Backup → Master
↑ ↓
←─────── Fault ←────┘IP address management :
Real IP : physical NIC IP of each device
Virtual IP : IP shared by the VRRP group
IP binding : Master binds VIP, Backup stands by
2.3 Process architecture
Parent process :
Manages child processes
Parses configuration files
Handles signals
Allocates resources
VRRP child process :
Handles VRRP protocol
Manages virtual IP
Performs state transitions
Sends heartbeat packets
Checker child process :
Executes health checks
Monitors backend services
Updates LVS configuration
Adjusts weights
3. VRRP protocol details
3.1 Overview
VRRP (Virtual Router Redundancy Protocol) is a selection protocol that dynamically assigns the responsibility of a virtual router to one of the VRRP routers on a LAN.
Protocol features :
RFC 3768 standard
Priority‑based master selection
Fast fault detection and switch
Preemptive mode support
3.2 Packet structure
VRRP Header Format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Type | Virtual RtrID | Priority | Count IP Addrs|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Auth Type | Adver Int | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address (1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address (n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Authentication Data (1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Authentication Data (2) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Field description :
Version : protocol version (currently 2)
Type : packet type (1 = advertisement)
Virtual RtrID : virtual router ID (1‑255)
Priority : 1‑254 (255 special)
Count IP Addrs : number of IP addresses
Auth Type : authentication type
Adver Int : advertisement interval
Checksum : checksum
3.3 State machine
Three states :
Initialize : initial state at system start, then moves to Backup
Backup : listens for Master advertisements; becomes Master if none received
Master : answers VIP requests and sends VRRP advertisements; reverts to Backup if a higher‑priority advert is received
State transition conditions :
Initialize → Backup: startup event
Backup → Master: timeout without Master advert
Master → Backup: higher‑priority advert received3.4 Timers
Advertisement Timer : interval at which Master sends adverts (default 1 s, configurable 1‑255 s)
Master Down Timer : Backup's timeout waiting for Master adverts (computed as (3 × Adver_Int) + Skew_Time, where Skew_Time = (256‑Priority)/256)
Preemption Timer : delay before a higher‑priority Backup preempts the current Master, preventing frequent switches
4. Installing and configuring Keepalived
4.1 Prepare system environment
System requirements :
Linux kernel ≥ 2.6
VRRP support
Network interface with multicast support
Dependency installation :
# CentOS/RHEL
yum install -y gcc gcc-c++ openssl-devel libnl3-devel
# Ubuntu/Debian
apt-get install -y gcc g++ libssl-dev libnl-3-dev libnl-genl-3-dev4.2 Compile and install
Download source :
wget https://www.keepalived.org/software/keepalived-2.2.8.tar.gz
tar -zxf keepalived-2.2.8.tar.gz
cd keepalived-2.2.8Configure build :
./configure --prefix=/usr/local/keepalived \
--sysconfdir=/etc \
--enable-vrrp \
--enable-lvs \
--enable-snmp \
--enable-sha1 \
--with-kernel-dir=/usr/src/kernels/$(uname -r)Compile and install :
make && make install4.3 Package installation
CentOS/RHEL: yum install -y keepalived Ubuntu/Debian:
apt-get install -y keepalived4.4 Basic configuration example
Master node configuration ( /etc/keepalived/keepalived.conf)
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_MASTER
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 3
weight -2
fall 2
rise 1
timeout 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100/24
}
}Notification script example ( /etc/keepalived/notify.sh)
#!/bin/bash
TYPE=$1
NAME=$2
STATE=$3
case $STATE in
"MASTER")
echo "$(date) - $NAME switched to MASTER" >> /var/log/keepalived-notify.log
systemctl start nginx
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"Keepalived: node switched to MASTER"}' \
https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
;;
"BACKUP")
echo "$(date) - $NAME switched to BACKUP" >> /var/log/keepalived-notify.log
systemctl stop nginx
;;
"FAULT")
echo "$(date) - $NAME entered FAULT" >> /var/log/keepalived-notify.log
echo "Keepalived fault alarm" | mail -s "Keepalived Fault" [email protected]
;;
esac5. Troubleshooting and optimization
5.1 Common issues
VIP cannot bind :
# Check network interface
ip addr show eth0
# Check firewall
iptables -L | grep -i vrrp
firewall-cmd --list-all
# Check kernel modules
lsmod | grep ip_vs
modprobe ip_vs
# Check logs
journalctl -u keepalived -f
tail -f /var/log/messages | grep -i keepalivedVRRP communication problems :
# Packet capture
tcpdump -i eth0 -nn proto 112
tcpdump -i eth0 -nn host 224.0.0.18
# Check multicast configuration
cat /proc/net/igmp
# Test unicast connectivity
ping 192.168.1.102
telnet 192.168.1.102 225.2 Performance tuning
System‑level optimizations :
# Enable IP forwarding
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
# Disable redirects
echo 'net.ipv4.conf.all.send_redirects = 0' >> /etc/sysctl.conf
echo 'net.ipv4.conf.default.send_redirects = 0' >> /etc/sysctl.conf
sysctl -p
# Increase file descriptor limit
ulimit -n 65535
# Network buffer tuning
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.confKeepalived configuration tuning :
vrrp_script chk_service {
script "/path/to/check_script.sh"
interval 2
weight -10
fall 2
rise 1
timeout 1
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
preempt_delay 30
dont_track_primary
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100/24
}
}5.3 Monitoring and operations
Monitoring script example :
#!/bin/bash
# /usr/local/bin/keepalived_monitor.sh
STATUS=$(systemctl is-active keepalived)
PID=$(pgrep keepalived)
VIP_STATUS=$(ip addr show eth0 | grep -c "192.168.1.100")
VRRP_STATE=$(grep -i "state" /var/log/messages | tail -1 | grep -o "MASTER\|BACKUP\|FAULT")
echo "Keepalived service status: $STATUS"
echo "Keepalived PID: $PID"
echo "VIP bind status: $VIP_STATUS"
echo "VRRP current state: $VRRP_STATE"
curl -X POST -H 'Content-Type: application/json' \
-d "{\"service\":\"keepalived\",\"status\":\"$STATUS\",\"vip\":\"$VIP_STATUS\",\"state\":\"$VRRP_STATE\"}" \
http://monitoring.example.com/api/metrics6. Summary
6.1 Core value of Keepalived
Keepalived, as a mature HA solution, offers:
Standard VRRP compatibility
Lightweight design with low resource consumption
Rich feature set for diverse scenarios
Highly customizable configuration
Application advantages include easy deployment, active community, proven production use, and tight integration with Linux.
6.2 Best‑practice summary
Configuration recommendations :
Choose the appropriate mode based on business needs
Set priorities and check intervals reasonably
Implement comprehensive health checks and notification mechanisms
Perform thorough testing and validation
Operations suggestions :
Establish complete monitoring
Conduct regular failover drills
Maintain configuration consistency
Keep software up‑to‑date
Security advice :
Use authentication to protect VRRP traffic
Restrict VRRP packet access
Regularly audit and update configurations
Enable log auditing
6.3 Future trends
Technical developments include cloud‑native adaptation, containerized deployment, micro‑service integration, and automation‑ops integration. Application extensions cover multi‑cloud, hybrid cloud, edge computing, and IoT high‑availability scenarios.
Keepalived will continue to play a vital role in modern IT infrastructure; mastering its principles and configuration enables building stable, reliable high‑availability service architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
