Operations 23 min read

Mastering LVS+Keepalived: Build a Production-Ready High-Availability Load Balancer

This comprehensive guide walks you through the principles, architecture, deployment steps, performance tuning, monitoring, and advanced techniques for building a robust, production‑grade high‑availability load‑balancing solution using LVS and Keepalived, suitable for both beginners and seasoned engineers.

Ops Community
Ops Community
Ops Community
Mastering LVS+Keepalived: Build a Production-Ready High-Availability Load Balancer

LVS+Keepalived High Availability Architecture Practice: From Intro to Production Deployment

Introduction: Why every ops should master LVS+Keepalived?

During a midnight production incident, our primary load balancer crashed and the e‑commerce platform went down instantly, highlighting that a system without high‑availability architecture is like a tightrope walker without a safety net. This experience led to deep research and the discovery that the LVS+Keepalived combination offers excellent performance, simple deployment, and low maintenance cost.

1. LVS+Keepalived Architecture Principles Deep Dive

1.1 Why choose LVS?

LVS (Linux Virtual Server) is a kernel‑level load balancer developed by Dr. Zhang Wensong. Compared with application‑layer balancers like Nginx or HAProxy, LVS runs in kernel space and provides:

Extremely high performance : can handle millions of concurrent connections per node.

Very low latency : kernel‑level forwarding with almost no overhead.

Stable and reliable : rigorously tested by the Linux kernel team.

Low resource consumption : minimal CPU and memory usage.

1.2 Core value of Keepalived

Keepalived is more than a simple HA tool; its three core functions simplify operations:

VRRP protocol implementation : automatic failover with second‑level recovery.

Health check mechanism : multi‑level, multi‑dimensional health detection.

LVS rule management : dynamic IPVS rule management without manual intervention.

1.3 Architecture design patterns

Three common deployment modes in production:

Mode 1: Dual‑node active‑standby Suitable for small‑to‑medium services, low cost, simple maintenance. One master handles all traffic, the backup stays idle.

Mode 2: Dual‑node active‑active For high‑traffic scenarios, both servers work simultaneously and act as backups, achieving higher resource utilization.

Mode 3: Multi‑level cascade For ultra‑large clusters, traffic is distributed through multiple LVS layers.

2. Production Deployment Practice

2.1 Environment preparation and planning

Example for a e‑commerce platform with tens of millions of daily PVs:

# Architecture planning
Load balancing layer:
- LVS-Master: 192.168.1.10
- LVS-Backup: 192.168.1.11
- VIP: 192.168.1.100

Web service layer:
- Web-01: 192.168.1.20
- Web-02: 192.168.1.21
- Web-03: 192.168.1.22

2.2 LVS installation and basic configuration

Deploy LVS on CentOS 7/8 or Ubuntu 20.04:

#!/bin/bash
# LVS quick deployment script
yum install -y ipvsadm keepalived conntrack-tools
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe ip_vs_lc
cat >> /etc/modules-load.d/ipvs.conf <<EOF
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
ip_vs_lc
nf_conntrack
EOF
cat >> /etc/sysctl.conf <<EOF
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
EOF
sysctl -p
echo "LVS basic environment configuration completed!"

2.3 Keepalived master node configuration

Template includes health checks and email alerts:

# /etc/keepalived/keepalived.conf - Master node configuration
global_defs {
    router_id LVS_MASTER
    script_user root
    enable_script_security
    notification_email {
        [email protected]
    }
    notification_email_from [email protected]
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
}

vrrp_script chk_nginx {
    script "/usr/local/bin/check_service.sh"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass StrongP@ss2024
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:1
    }
    track_script {
        chk_nginx
    }
    notify_master "/usr/local/bin/notify.sh master"
    notify_backup "/usr/local/bin/notify.sh backup"
    notify_fault "/usr/local/bin/notify.sh fault"
}

virtual_server 192.168.1.100 80 {
    delay_loop 6
    lb_algo wrr
    lb_kind DR
    persistence_timeout 50
    protocol TCP
    real_server 192.168.1.20 80 {
        weight 3
        TCP_CHECK {
            connect_timeout 3
            retry 3
            delay_before_retry 3
            connect_port 80
        }
    }
    real_server 192.168.1.21 80 {
        weight 2
        TCP_CHECK {
            connect_timeout 3
            retry 3
            delay_before_retry 3
            connect_port 80
        }
    }
    real_server 192.168.1.22 80 {
        weight 1
        HTTP_GET {
            url {
                path /health
                status_code 200
            }
            connect_timeout 3
            retry 3
            delay_before_retry 3
        }
    }
}

2.4 Backup node configuration points

Backup node is similar to master, only change state to BACKUP and lower priority:

# Backup node differences
vrrp_instance VI_1 {
    state BACKUP
    priority 90
    # other settings remain the same
}

2.5 Real server configuration

Configure VIP on each web server and adjust ARP parameters:

#!/bin/bash
VIP="192.168.1.100"
ip addr add $VIP/32 dev lo
ip route add $VIP dev lo
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce
cat >> /etc/rc.local <<EOF
ip addr add $VIP/32 dev lo
ip route add $VIP dev lo
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
EOF
chmod +x /etc/rc.local
echo "Real Server configuration completed!"

3. Advanced Optimization Techniques

3.1 Performance tuning practice

Key points:

1. IPVS connection table optimization

# Adjust IPVS connection table size
echo "options ip_vs conn_tab_bits=20" >> /etc/modprobe.d/ip_vs.conf

# Optimize timeout parameters
ipvsadm --set 900 120 300
# 900: TCP session timeout
# 120: TCP FIN timeout
# 300: UDP timeout

2. Network stack optimization

# High‑performance network parameters
cat >> /etc/sysctl.conf <<EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 8192
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
EOF
sysctl -p

3.2 Monitoring and alert system

Python script example:

#!/usr/bin/env python3
# LVS status monitoring script
import subprocess, json, requests
from datetime import datetime

class LVSMonitor:
    def __init__(self):
        self.webhook_url = "https://your-webhook.com/alert"

    def get_ipvs_stats(self):
        cmd = "ipvsadm -L -n --stats"
        result = subprocess.run(cmd.split(), capture_output=True, text=True)
        return self.parse_ipvs_output(result.stdout)

    def check_real_servers(self):
        cmd = "ipvsadm -L -n"
        result = subprocess.run(cmd.split(), capture_output=True, text=True)
        servers = []
        for line in result.stdout.split('
'):
            if '->' in line:
                parts = line.split()
                server = {
                    'ip': parts[1].split(':')[0],
                    'port': parts[1].split(':')[1],
                    'weight': parts[3],
                    'active_conn': parts[4],
                    'inactive_conn': parts[5]
                }
                servers.append(server)
        return servers

    def send_alert(self, message):
        payload = {
            'timestamp': datetime.now().isoformat(),
            'level': 'warning',
            'message': message,
            'source': 'LVS Monitor'
        }
        requests.post(self.webhook_url, json=payload)

    def run(self):
        servers = self.check_real_servers()
        for server in servers:
            if int(server['weight']) == 0:
                self.send_alert(f"Server {server['ip']} weight is 0, possible issue")
        total_conn = sum(int(s['active_conn']) for s in servers)
        if total_conn > 10000:
            self.send_alert(f"Total connections too high: {total_conn}")

if __name__ == "__main__":
    LVSMonitor().run()

3.3 Fault self‑healing mechanism

Bash script that checks service health and restores weight:

#!/bin/bash
LOG_FILE="/var/log/lvs_auto_recovery.log"
MAX_RETRY=3

log_message() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> $LOG_FILE
}

check_and_recover() {
    local server_ip=$1
    local server_port=$2
    local retry_count=0
    while [ $retry_count -lt $MAX_RETRY ]; do
        nc -z -w2 $server_ip $server_port
        if [ $? -eq 0 ]; then
            ipvsadm -e -t $VIP:80 -r $server_ip:$server_port -w 3
            log_message "Server $server_ip recovered"
            return 0
        fi
        retry_count=$((retry_count + 1))
        sleep 5
    done
    ipvsadm -e -t $VIP:80 -r $server_ip:$server_port -w 0
    log_message "Server $server_ip failed, taken offline"
    return 1
}

while true; do
    for server in "192.168.1.20" "192.168.1.21" "192.168.1.22"; do
        check_and_recover $server 80
    done
    sleep 30
done

4. Real‑World Case Studies

4.1 Million‑concurrency optimization case

During last Double‑11, the platform needed to handle over 100k QPS. Optimizations included hardware (10 GbE, multi‑queue), system (CPU affinity, NUMA), LVS (source‑hash algorithm), and application (static‑dynamic separation, CDN). Key configuration:

# CPU affinity
echo 2 > /proc/irq/24/smp_affinity
echo 4 > /proc/irq/25/smp_affinity
# Enable NIC multi‑queue
ethtool -L eth0 combined 8
# LVS use source‑hash
ipvsadm -E -t 192.168.1.100:80 -s sh

4.2 Cross‑data‑center disaster recovery

Design uses two LVS clusters in Beijing (primary, 60 % weight) and Shanghai (secondary, 40 % weight) with geo‑DNS for intelligent routing.

4.3 Gray‑release practice

Gradual traffic shift using LVS weight adjustments:

# 10% traffic
ipvsadm -a -t $VIP:80 -r $NEW_SERVER:80 -w 1
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 9
# 30% traffic
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 3
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 7
# 50% traffic
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 5
ipvsadm -e -t $VIP:80 -r $OLD_SERVER:80 -w 5
# Full switch
ipvsadm -e -t $VIP:80 -r $NEW_SERVER:80 -w 10
ipvsadm -d -t $VIP:80 -r $OLD_SERVER:80

5. Frequently Asked Questions

5.1 Split‑brain handling

Bash script to detect split‑brain and release VIP if gateway unreachable:

#!/bin/bash
PEER_IP="192.168.1.11"
VIP="192.168.1.100"

check_split_brain() {
    ping -c 1 -W 1 $PEER_IP >/dev/null 2>&1
    peer_status=$?
    ip addr show | grep $VIP >/dev/null 2>&1
    vip_local=$?
    if [ $peer_status -ne 0 ] && [ $vip_local -eq 0 ]; then
        ping -c 1 -W 1 192.168.1.1 | grep -c "1 received"
        if [ $? -eq 0 ]; then
            systemctl stop keepalived
            logger "Split‑brain detected, VIP released"
        fi
    fi
}
while true; do
    check_split_brain
    sleep 10
done

5.2 Session persistence issues

Two solutions: source‑IP persistence and cookie‑based persistence (requires application support).

# Source‑IP persistence
ipvsadm -E -t 192.168.1.100:80 -s sh -p 1800

# Cookie‑based (in Keepalived)
virtual_server 192.168.1.100 80 {
    persistence_timeout 1800
    persistence_granularity /24
}

5.3 Performance bottleneck investigation

Typical toolchain:

# View connection stats
ipvsadm -L -n --stats --rate

# Monitor traffic
iftop -i eth0 -P

# Check conntrack table
conntrack -L | wc -l

# System bottleneck analysis
dstat -cdnmlp --top-cpu --top-mem

# Packet capture
tcpdump -i eth0 -w lvs.pcap 'host 192.168.1.100'

6. Best‑Practice Summary

6.1 Design principles

Simplicity over complexity : use active‑standby when sufficient.

Monitoring before optimization : data‑driven improvements.

Automation : script repetitive tasks.

Capacity planning : keep 30 % headroom.

Documentation : record every change.

6.2 Operational checklist

Daily checks: IPVS rule status, Keepalived process, VIP drift, backend health, system logs. Weekly: performance trends, capacity assessment, config backup, disaster‑recovery drills. Monthly: architecture review, security hardening, version upgrade evaluation.

6.3 Failure‑drill procedure

Script to simulate master failure, network partition, and backend outage, then verify VIP failover and traffic redistribution.

7. Future Outlook and Technology Trends

7.1 LVS in the cloud‑native era

kube‑proxy IPVS mode : ten‑fold performance boost over iptables.

MetalLB : bare‑metal load balancer based on LVS.

Service‑mesh integration : works with Istio, Linkerd.

7.2 Emerging technologies

eBPF acceleration : further improve forwarding speed.

DPDK integration : user‑space high‑performance packet processing.

Intelligent scheduling : machine‑learning‑driven dynamic load‑balancing algorithms.

Conclusion

LVS+Keepalived remains a classic, powerful high‑availability load‑balancing solution in the cloud computing era. By mastering its principles, deployment, tuning, and automation, operators can build stable, efficient systems and continuously improve through monitoring, drills, and innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsload balancingLinuxnetwork optimizationLVSkeepalived
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.