Mastering LVS+Keepalived: Build a Production-Ready High-Availability Load Balancer
This comprehensive guide walks you through the principles, architecture, deployment steps, performance tuning, monitoring, and advanced techniques for building a robust, production‑grade high‑availability load‑balancing solution using LVS and Keepalived, suitable for both beginners and seasoned engineers.
LVS+Keepalived High Availability Architecture Practice: From Intro to Production Deployment
Introduction: Why every ops should master LVS+Keepalived?
During a midnight production incident, our primary load balancer crashed and the e‑commerce platform went down instantly, highlighting that a system without high‑availability architecture is like a tightrope walker without a safety net. This experience led to deep research and the discovery that the LVS+Keepalived combination offers excellent performance, simple deployment, and low maintenance cost.
1. LVS+Keepalived Architecture Principles Deep Dive
1.1 Why choose LVS?
LVS (Linux Virtual Server) is a kernel‑level load balancer developed by Dr. Zhang Wensong. Compared with application‑layer balancers like Nginx or HAProxy, LVS runs in kernel space and provides:
Extremely high performance : can handle millions of concurrent connections per node.
Very low latency : kernel‑level forwarding with almost no overhead.
Stable and reliable : rigorously tested by the Linux kernel team.
Low resource consumption : minimal CPU and memory usage.
1.2 Core value of Keepalived
Keepalived is more than a simple HA tool; its three core functions simplify operations:
VRRP protocol implementation : automatic failover with second‑level recovery.
Health check mechanism : multi‑level, multi‑dimensional health detection.
LVS rule management : dynamic IPVS rule management without manual intervention.
1.3 Architecture design patterns
Three common deployment modes in production:
Mode 1: Dual‑node active‑standby Suitable for small‑to‑medium services, low cost, simple maintenance. One master handles all traffic, the backup stays idle.
Mode 2: Dual‑node active‑active For high‑traffic scenarios, both servers work simultaneously and act as backups, achieving higher resource utilization.
Mode 3: Multi‑level cascade For ultra‑large clusters, traffic is distributed through multiple LVS layers.
2. Production Deployment Practice
2.1 Environment preparation and planning
Example for a e‑commerce platform with tens of millions of daily PVs:
# Architecture planning
Load balancing layer:
- LVS-Master: 192.168.1.10
- LVS-Backup: 192.168.1.11
- VIP: 192.168.1.100
Web service layer:
- Web-01: 192.168.1.20
- Web-02: 192.168.1.21
- Web-03: 192.168.1.222.2 LVS installation and basic configuration
Deploy LVS on CentOS 7/8 or Ubuntu 20.04:
#!/bin/bash
# LVS quick deployment script
yum install -y ipvsadm keepalived conntrack-tools
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe ip_vs_lc
cat >> /etc/modules-load.d/ipvs.conf <<EOF
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
ip_vs_lc
nf_conntrack
EOF
cat >> /etc/sysctl.conf <<EOF
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
EOF
sysctl -p
echo "LVS basic environment configuration completed!"2.3 Keepalived master node configuration
Template includes health checks and email alerts:
# /etc/keepalived/keepalived.conf - Master node configuration
global_defs {
router_id LVS_MASTER
script_user root
enable_script_security
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
}
vrrp_script chk_nginx {
script "/usr/local/bin/check_service.sh"
interval 2
weight -20
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass StrongP@ss2024
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:1
}
track_script {
chk_nginx
}
notify_master "/usr/local/bin/notify.sh master"
notify_backup "/usr/local/bin/notify.sh backup"
notify_fault "/usr/local/bin/notify.sh fault"
}
virtual_server 192.168.1.100 80 {
delay_loop 6
lb_algo wrr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 192.168.1.20 80 {
weight 3
TCP_CHECK {
connect_timeout 3
retry 3
delay_before_retry 3
connect_port 80
}
}
real_server 192.168.1.21 80 {
weight 2
TCP_CHECK {
connect_timeout 3
retry 3
delay_before_retry 3
connect_port 80
}
}
real_server 192.168.1.22 80 {
weight 1
HTTP_GET {
url {
path /health
status_code 200
}
connect_timeout 3
retry 3
delay_before_retry 3
}
}
}2.4 Backup node configuration points
Backup node is similar to master, only change state to BACKUP and lower priority:
# Backup node differences
vrrp_instance VI_1 {
state BACKUP
priority 90
# other settings remain the same
}2.5 Real server configuration
Configure VIP on each web server and adjust ARP parameters:
#!/bin/bash
VIP="192.168.1.100"
ip addr add $VIP/32 dev lo
ip route add $VIP dev lo
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce
cat >> /etc/rc.local <<EOF
ip addr add $VIP/32 dev lo
ip route add $VIP dev lo
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
EOF
chmod +x /etc/rc.local
echo "Real Server configuration completed!"3. Advanced Optimization Techniques
3.1 Performance tuning practice
Key points:
1. IPVS connection table optimization
# Adjust IPVS connection table size
echo "options ip_vs conn_tab_bits=20" >> /etc/modprobe.d/ip_vs.conf
# Optimize timeout parameters
ipvsadm --set 900 120 300
# 900: TCP session timeout
# 120: TCP FIN timeout
# 300: UDP timeout2. Network stack optimization
# High‑performance network parameters
cat >> /etc/sysctl.conf <<EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 8192
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
EOF
sysctl -p3.2 Monitoring and alert system
Python script example:
#!/usr/bin/env python3
# LVS status monitoring script
import subprocess, json, requests
from datetime import datetime
class LVSMonitor:
def __init__(self):
self.webhook_url = "https://your-webhook.com/alert"
def get_ipvs_stats(self):
cmd = "ipvsadm -L -n --stats"
result = subprocess.run(cmd.split(), capture_output=True, text=True)
return self.parse_ipvs_output(result.stdout)
def check_real_servers(self):
cmd = "ipvsadm -L -n"
result = subprocess.run(cmd.split(), capture_output=True, text=True)
servers = []
for line in result.stdout.split('
'):
if '->' in line:
parts = line.split()
server = {
'ip': parts[1].split(':')[0],
'port': parts[1].split(':')[1],
'weight': parts[3],
'active_conn': parts[4],
'inactive_conn': parts[5]
}
servers.append(server)
return servers
def send_alert(self, message):
payload = {
'timestamp': datetime.now().isoformat(),
'level': 'warning',
'message': message,
'source': 'LVS Monitor'
}
requests.post(self.webhook_url, json=payload)
def run(self):
servers = self.check_real_servers()
for server in servers:
if int(server['weight']) == 0:
self.send_alert(f"Server {server['ip']} weight is 0, possible issue")
total_conn = sum(int(s['active_conn']) for s in servers)
if total_conn > 10000:
self.send_alert(f"Total connections too high: {total_conn}")
if __name__ == "__main__":
LVSMonitor().run()3.3 Fault self‑healing mechanism
Bash script that checks service health and restores weight:
#!/bin/bash
LOG_FILE="/var/log/lvs_auto_recovery.log"
MAX_RETRY=3
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> $LOG_FILE
}
check_and_recover() {
local server_ip=$1
local server_port=$2
local retry_count=0
while [ $retry_count -lt $MAX_RETRY ]; do
nc -z -w2 $server_ip $server_port
if [ $? -eq 0 ]; then
ipvsadm -e -t $VIP:80 -r $server_ip:$server_port -w 3
log_message "Server $server_ip recovered"
return 0
fi
retry_count=$((retry_count + 1))
sleep 5
done
ipvsadm -e -t $VIP:80 -r $server_ip:$server_port -w 0
log_message "Server $server_ip failed, taken offline"
return 1
}
while true; do
for server in "192.168.1.20" "192.168.1.21" "192.168.1.22"; do
check_and_recover $server 80
done
sleep 30
done4. Real‑World Case Studies
4.1 Million‑concurrency optimization case
During last Double‑11, the platform needed to handle over 100k QPS. Optimizations included hardware (10 GbE, multi‑queue), system (CPU affinity, NUMA), LVS (source‑hash algorithm), and application (static‑dynamic separation, CDN). Key configuration:
# CPU affinity
echo 2 > /proc/irq/24/smp_affinity
echo 4 > /proc/irq/25/smp_affinity
# Enable NIC multi‑queue
ethtool -L eth0 combined 8
# LVS use source‑hash
ipvsadm -E -t 192.168.1.100:80 -s sh4.2 Cross‑data‑center disaster recovery
Design uses two LVS clusters in Beijing (primary, 60 % weight) and Shanghai (secondary, 40 % weight) with geo‑DNS for intelligent routing.
4.3 Gray‑release practice
Gradual traffic shift using LVS weight adjustments:
# 10% traffic
ipvsadm -a -t $VIP:80 -r $NEW_SERVER:80 -w 1
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 9
# 30% traffic
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 3
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 7
# 50% traffic
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 5
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 5
# Full switch
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 10
ipvsadm -d -t $VIP:80 -r $OLD_SERVER:805. Frequently Asked Questions
5.1 Split‑brain handling
Bash script to detect split‑brain and release VIP if gateway unreachable:
#!/bin/bash
PEER_IP="192.168.1.11"
VIP="192.168.1.100"
check_split_brain() {
ping -c 1 -W 1 $PEER_IP >/dev/null 2>&1
peer_status=$?
ip addr show | grep $VIP >/dev/null 2>&1
vip_local=$?
if [ $peer_status -ne 0 ] && [ $vip_local -eq 0 ]; then
ping -c 1 -W 1 192.168.1.1 | grep -c "1 received"
if [ $? -eq 0 ]; then
systemctl stop keepalived
logger "Split‑brain detected, VIP released"
fi
fi
}
while true; do
check_split_brain
sleep 10
done5.2 Session persistence issues
Two solutions: source‑IP persistence and cookie‑based persistence (requires application support).
# Source‑IP persistence
ipvsadm -E -t 192.168.1.100:80 -s sh -p 1800
# Cookie‑based (in Keepalived)
virtual_server 192.168.1.100 80 {
persistence_timeout 1800
persistence_granularity /24
}5.3 Performance bottleneck investigation
Typical toolchain:
# View connection stats
ipvsadm -L -n --stats --rate
# Monitor traffic
iftop -i eth0 -P
# Check conntrack table
conntrack -L | wc -l
# System bottleneck analysis
dstat -cdnmlp --top-cpu --top-mem
# Packet capture
tcpdump -i eth0 -w lvs.pcap 'host 192.168.1.100'6. Best‑Practice Summary
6.1 Design principles
Simplicity over complexity : use active‑standby when sufficient.
Monitoring before optimization : data‑driven improvements.
Automation : script repetitive tasks.
Capacity planning : keep 30 % headroom.
Documentation : record every change.
6.2 Operational checklist
Daily checks: IPVS rule status, Keepalived process, VIP drift, backend health, system logs. Weekly: performance trends, capacity assessment, config backup, disaster‑recovery drills. Monthly: architecture review, security hardening, version upgrade evaluation.
6.3 Failure‑drill procedure
Script to simulate master failure, network partition, and backend outage, then verify VIP failover and traffic redistribution.
7. Future Outlook and Technology Trends
7.1 LVS in the cloud‑native era
kube‑proxy IPVS mode : ten‑fold performance boost over iptables.
MetalLB : bare‑metal load balancer based on LVS.
Service‑mesh integration : works with Istio, Linkerd.
7.2 Emerging technologies
eBPF acceleration : further improve forwarding speed.
DPDK integration : user‑space high‑performance packet processing.
Intelligent scheduling : machine‑learning‑driven dynamic load‑balancing algorithms.
Conclusion
LVS+Keepalived remains a classic, powerful high‑availability load‑balancing solution in the cloud computing era. By mastering its principles, deployment, tuning, and automation, operators can build stable, efficient systems and continuously improve through monitoring, drills, and innovation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
