Operations 59 min read

LVS Load Balancing: Deep Dive into Four Modes and Step‑by‑Step Deployment

This guide explains the four operating modes of Linux Virtual Server (LVS)—NAT, DR, TUN, and FULLNAT—detailing packet flows, configuration steps, required kernel parameters, health checks, troubleshooting tips, and best‑practice deployment scripts for building a reliable, high‑performance load‑balancing cluster.

MaGe Linux Operations

Jun 12, 2026

LVS Load Balancing: Deep Dive into Four Modes and Step‑by‑Step Deployment

Purpose and Audience

The article is written for junior and intermediate operations engineers and backend developers who need to understand and deploy LVS (Linux Virtual Server) in production. It aims to clarify the principles of the four LVS working modes, show how packets travel, and provide a complete, reproducible deployment workflow.

Four LVS Working Modes Overview

NAT – Performs both destination NAT (DNAT) and source NAT (SNAT). All traffic passes through the Director, resulting in the highest CPU and bandwidth load on the Director. Requires the Director and Real Servers to be in the same subnet.

DR (Direct Routing) – Only the destination MAC address is rewritten. Real Servers send responses directly to the client, so the Director handles only the inbound direction. This mode offers the best performance and is the default choice for most high‑traffic scenarios.

TUN – Encapsulates packets in an IPIP tunnel, allowing the Director to forward traffic across different subnets or VLANs. Both the Director and Real Servers must support the IPIP module.

FULLNAT – Extends NAT by translating both source and destination IPs (double NAT). It enables cross‑VLAN or cross‑subnet deployments but requires kernel patches (lvs‑fullnat) or vendor‑specific patches and is not part of the mainline kernel.

Mode‑by‑Mode Details

1. NAT Mode

Network topology – The client sends traffic to the Director’s external interface; the Director DNATs to the Real Server’s IP, then SNATs the response back to the client.

client --> Director (eth0: public IP) --> VIP 10.0.0.100
   |                     |
   +---> Real Server 1 (10.0.0.11)
   +---> Real Server 2 (10.0.0.12)

Packet transformation

# Request
src=1.2.3.4:5000 dst=10.0.0.100:80   # Director DNAT
src=1.2.3.4:5000 dst=10.0.0.11:80

# Response
src=10.0.0.11:80 dst=1.2.3.4:5000   # Real Server replies to Director
src=10.0.0.100:80 dst=1.2.3.4:5000   # Director SNATs back to client

Typical scenarios

Few Real Servers (5‑10)

Need to isolate internal and external networks

Temporary test environments where Real Server network changes are undesirable

2. DR Mode

Network topology – The Director and Real Servers share the same VIP on a loopback alias (lo:0). The Director only rewrites the destination MAC; the Real Server sends the reply directly using the VIP as source.

client --> Director (eth0) --> VIP 10.0.0.100 (shared on lo:0)
Real Server 1 (eth0:10.0.0.11, lo:0:10.0.0.100)
Real Server 2 (eth0:10.0.0.12, lo:0:10.0.0.100)

Packet flow

# Request
src=client_ip dst=10.0.0.100:80   # Director changes MAC only

# Response
src=10.0.0.100:80 dst=client_ip   # Real Server replies directly

Key requirement – ARP suppression

# On each Real Server
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce

Typical scenarios

High‑throughput web, API, cache, or MySQL read‑load balancing

Dozens of Real Servers within the same VLAN

When the Director and Real Servers are on the same layer‑2 network

3. TUN Mode

Network topology – The Director creates an IPIP tunnel to each Real Server. The original client packet is encapsulated, sent through the tunnel, and the Real Server decapsulates it and replies directly.

client --> Director (VIP 1.1.1.1)
   | IPIP tunnel (protocol 50)
   +--> Real Server 1 (VIP 1.1.1.1 on tunl0)
   +--> Real Server 2 (VIP 1.1.1.1 on tunl0)

Packet transformation

# Outer IPIP header
src=Director_DIP dst=RS1_IP
# Inner original packet
src=client_ip dst=VIP:80

# Response after decapsulation
src=VIP:80 dst=client_ip

Typical scenarios

Real Servers located in different subnets, data centers, or regions

When you need to avoid making the Director a bottleneck but still require cross‑subnet traffic

Large numbers of Real Servers where NAT would overload the Director

Performance note – CPU overhead for encapsulation is about 5‑10 % on 1 Gbps links and <1 % on 10 Gbps links.

4. FULLNAT Mode

FULLNAT behaves like NAT but also rewrites the source IP, so Real Servers see the Director’s address instead of the client’s. It enables cross‑VLAN or cross‑subnet deployments without requiring the Real Server to have the VIP on a loopback interface. FULLNAT is not part of the upstream kernel; it requires the lvs‑fullnat patch or vendor‑specific binaries (e.g., Alibaba Cloud). It is rarely used for new projects.

Environment Preparation (CentOS 7 / RHEL 7 example)

# Install required packages
yum -y install ipvsadm keepalived
systemctl enable keepalived

# Verify IPVS kernel module
lsmod | grep ip_vs || modprobe ip_vs

# Disable NetworkManager interference (optional but recommended)
systemctl stop NetworkManager
systemctl disable NetworkManager
systemctl restart network

Important sysctl settings (saved in /etc/sysctl.d/99‑lvs.conf)

# Core LVS settings
net.ipv4.ip_forward = 0          # DR mode – keep disabled
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.rp_filter = 0   # Required for DR and FULLNAT
net.ipv4.conf.default.rp_filter = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 65535
net.core.somaxconn = 65535
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_fastopen = 3

Apply the settings:

sysctl -p /etc/sysctl.d/99-lvs.conf

IP Planning

Director (master) – hostname lvs-master, IP 10.0.0.10 (DIP)

Director (backup) – hostname lvs-backup, IP 10.0.0.20 VIP – 10.0.0.100/24 (public address presented to clients)

Real Server 1 – hostname web-01, IP 10.0.0.11 Real Server 2 – hostname web-02, IP

10.0.0.12

DR Mode Full Deployment

Real Server Configuration

# Create lo:0 with /32 mask for the VIP
cat > /etc/sysconfig/network-scripts/ifcfg-lo:0 <<'EOF'
DEVICE=lo:0
IPADDR=10.0.0.100
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback
EOF
ifup lo:0

# ARP suppression script (lvs‑rs)
cat > /etc/init.d/lvs-rs <<'EOF'
#!/bin/bash
# chkconfig: 2345 90 60
# description: LVS Real Server ARP suppression
VIP=10.0.0.100
case "$1" in
  start)
    echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
    echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
    echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
    echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce
    ;;
  stop)
    echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore
    echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce
    echo 0 > /proc/sys/net/ipv4/conf/lo/arp_ignore
    echo 0 > /proc/sys/net/ipv4/conf/lo/arp_announce
    ;;
  *)
    echo "Usage: $0 {start|stop}"
    exit 1
    ;;
esac
EOF
chmod +x /etc/init.d/lvs-rs
chkconfig --add lvs-rs
service lvs-rs start

Director keepalived Configuration (DR)

# /etc/keepalived/keepalived.conf (master example)
! Configuration File for keepalived

global_defs {
    router_id LVS_MASTER
    notification_email { [email protected] }
    notification_email_from [email protected]
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
}

vrrp_script check_lvs {
    script "/usr/local/bin/check_ipvs.sh"
    interval 3
    weight -20
    fall 3
    rise 2
    timeout 5
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass YourStrongPass
    }
    unicast_src_ip 10.0.0.10
    unicast_peer { 10.0.0.20 }
    virtual_ipaddress {
        10.0.0.100/24 dev eth0 label eth0:1
    }
    notify_master "/usr/local/bin/notify.sh master"
    notify_backup "/usr/local/bin/notify.sh backup"
    notify_fault "/usr/local/bin/notify.sh fault"
    track_script { check_lvs }
}

virtual_server 10.0.0.100 80 {
    delay_loop 6
    lb_algo wlc
    lb_kind DR
    persistence_timeout 50
    persistence_granularity 255.255.255.0
    protocol TCP
    sorry_server 127.0.0.1 80

    real_server 10.0.0.11 80 {
        weight 100
        TCP_CHECK {
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 2
            connect_port 80
        }
    }
    real_server 10.0.0.12 80 {
        weight 100
        TCP_CHECK {
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 2
            connect_port 80
        }
    }
}

Validate the configuration:

keepalived -t -f /etc/keepalived/keepalived.conf
systemctl reload keepalived

Health‑Check Scripts

# /usr/local/bin/check_ipvs.sh
#!/bin/bash
COUNT=$(ipvsadm -Ln --stats 2>/dev/null | awk '/->/ {print $5}' | awk '{s+=$1} END {print s+0}')
if [ "${COUNT:-0}" -lt 1 ]; then
    exit 1
fi
exit 0

# /usr/local/bin/notify.sh
#!/bin/bash
TYPE=$1
SUBJECT="[LVS] $TYPE @ $(hostname) @ $(date +%FT%T)"
echo "$SUBJECT" | mail -s "$SUBJECT" [email protected]
logger -t keepalived-notify "$SUBJECT"
exit 0

NAT Mode Deployment (Key Differences)

In NAT mode the Real Servers do **not** configure the VIP. The Director must enable IP forwarding and perform double NAT.

# Enable forwarding on the Director
echo 1 > /proc/sys/net/ipv4/ip_forward
# Keep the same sysctl file, but set ip_forward=1 for NAT

Adjust the keepalived.conf block:

virtual_server 10.0.0.100 80 {
    ...
    lb_kind NAT
    ...
}

On each Real Server set the default gateway to the Director’s internal IP:

# Example on Real Server 1
ip route replace default via 10.0.0.10 dev eth0

TUN Mode Deployment (Key Differences)

# Load IPIP module on Real Server
modprobe ipip
echo ipip >> /etc/modules-load.d/tunnel.conf

# Create IPIP tunnel
ip tunnel add tunl0 mode ipip local 10.0.0.11 remote 10.0.0.10
ip link set tunl0 up
ip addr add 10.0.0.100/32 dev tunl0

# Disable rp_filter on the tunnel interface
echo 0 > /proc/sys/net/ipv4/conf/tunl0/rp_filter

In the Director’s keepalived.conf change lb_kind TUN and keep the same VIP definition.

Scheduling Algorithms (lb_algo) and Their Typical Use‑Cases

rr

– Simple round‑robin, useful for identical servers and testing. wrr – Weighted round‑robin, for servers with different capacities. lc – Least connections, best for long‑lived connections. wlc – Weighted least connections, the most common choice for high‑throughput web services. sh – Source‑IP hash, provides session persistence without enabling persistence_timeout. dh – Destination‑IP hash, useful for cache clusters. nq – Never queue, suitable for burst traffic.

In production 90 % of cases use wlc.

Troubleshooting Cases

Case 1 – DR Mode SYN Backlog Saturation

Check ipvsadm -Lnc for a large number of SYN_RECV entries.

On Real Servers verify netstat -s | grep -i listen for “SYNs to LISTEN sockets dropped”.

Increase kernel parameters:

echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog
echo 65535 > /proc/sys/net/core/somaxconn

Case 2 – NAT Mode Real Server Unreachable

Confirm the Director’s ipvsadm -Ln --rate shows traffic going to a single Real Server.

On the problematic Real Server capture packets: tcpdump -i eth0 host 10.0.0.100 and not port 22 Check the default gateway; it must point to the Director’s internal IP.

If the gateway is wrong, restore it or add a policy route for the VIP source.

Case 3 – keepalived VRRP Flapping

Capture VRRP packets: tcpdump -i eth0 vrrp.

Ensure unicast_peer and unicast_src_ip match on both nodes.

Increase vrrp_script interval to ≥3 s and reduce weight magnitude.

Consider disabling preemption with nopreempt if the business tolerates a fixed master.

Case 4 – TUN Mode Latency Spike

Verify encapsulation with tcpdump -i eth0 -nn -p ip proto 50.

Run traceroute to the remote Real Server to detect extra hops.

Remember that each IPIP hop adds 5‑10 ms; keep TUN deployments within 5‑10 ms RTT, otherwise use Anycast BGP or a layer‑7 LB.

Monitoring and Metrics

IPVS does not expose a native metrics endpoint, so a custom exporter is needed. Example node_exporter textfile collector:

# /usr/local/bin/ipvs_metrics.sh
#!/bin/bash
OUT=/var/lib/node_exporter/textfile_collector/ipvs.prom
echo -n > $OUT
ipvsadm -Ln --rate | awk -v ts=$(date +%s) '
    /^TCP/ {proto="tcp"; next}
    /^UDP/ {proto="udp"; next}
    $4 ~ /:/ {
        split($4, a, ":")
        printf "lvs_rate_in_pps{vip=\"%s\",port=\"%s\",proto=\"%s\"} %s
", a[1], a[2], proto, $5
        printf "lvs_rate_out_pps{vip=\"%s\",port=\"%s\",proto=\"%s\"} %s
", a[1], a[2], proto, $6
        printf "lvs_rate_in_cps{vip=\"%s\",port=\"%s\",proto=\"%s\"} %s
", a[1], a[2], proto, $7
        printf "lvs_rate_out_cps{vip=\"%s\",port=\"%s\",proto=\"%s\"} %s
", a[1], a[2], proto, $8
    }
' >> $OUT

Prometheus scrape config (example):

scrape_configs:
  - job_name: "ipvs"
    static_configs:
      - targets: ["10.0.0.10:9100"]
        labels:
          role: lvs-director

Typical alerts (Prometheus rule snippets):

- alert: LVSVipDown
  expr: probe_success{job="blackbox_lvs"} == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "LVS VIP {{ $labels.instance }} unreachable"

- alert: LVSActiveConnHigh
  expr: sum(lvs_active_connections) > 500000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "LVS active connections exceed threshold"

- alert: LVSRealServerDown
  expr: count(up{job="realserver"} == 0) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "One or more Real Servers are down"

Comparison with Other Load‑Balancing Solutions

LVS (DR) – Kernel‑level 4 layer, highest throughput, limited to layer‑4 features, moderate learning curve.

Nginx stream – User‑space 4 layer, good performance, easy L7 extensions, lower learning curve.

HAProxy – User‑space 4/7 layer, rich health checks and ACLs, suitable for large L7 workloads.

F5 / A10 hardware – Dedicated hardware, highest performance, expensive, steep learning curve.

Common industry practice: combine DR‑mode LVS as a fast 4 layer entry with Nginx/HAProxy for L7 routing.

Best‑Practice Checklist

Deploy Director pair with unicast VRRP (single‑hop heartbeat).

Prefer DR mode unless cross‑VLAN is required.

Configure Real Server loopback alias lo:0 with a /32 mask for the VIP.

Apply full ARP suppression on every Real Server (both all and lo namespaces).

Disable rp_filter on lo but keep it enabled on the physical interface.

Use keepalived health checks that verify application logic (HTTP_GET, TCP_CHECK with proper timeouts).

Set persistence_timeout ≤ 60 s; avoid session persistence for pure load‑balancing.

Increase ip_vs_conn_tab_bits to ≥ 18 for > 100 k concurrent connections.

Raise nf_conntrack_max to ≥ 2 M and tune TCP timeouts according to workload.

Monitor VIP reachability, IPVS ActiveConn, Real Server health, Director CPU interrupt usage, and network throughput.

Never clear the rule set with ipvsadm -C in production; always use keepalived for versioned configuration.

Avoid FULLNAT for new projects unless cross‑VLAN is mandatory.

For latency‑sensitive services, enable NIC busy‑poll, GRO, and bind NIC interrupts to dedicated CPU cores.

When running in public clouds, verify that multicast is disabled, MAC address changes are allowed, and security groups permit VRRP (protocol 112).

Conclusion

LVS has been a reliable, kernel‑level load‑balancing solution for over two decades. Its simplicity lies in mastering the four modes, configuring ARP suppression correctly, and using keepalived for HA. By following the step‑by‑step procedures, health‑check scripts, and the checklist above, operators can build a production‑grade LVS cluster that handles tens of millions of connections with minimal latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load balancing network troubleshooting Linux kernel high performance LVS keepalived DR mode

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Purpose and Audience

Four LVS Working Modes Overview

Mode‑by‑Mode Details

1. NAT Mode

2. DR Mode

3. TUN Mode

4. FULLNAT Mode

Environment Preparation (CentOS 7 / RHEL 7 example)

IP Planning

DR Mode Full Deployment

Real Server Configuration

Director keepalived Configuration (DR)

Health‑Check Scripts

NAT Mode Deployment (Key Differences)

TUN Mode Deployment (Key Differences)

Scheduling Algorithms (lb_algo) and Their Typical Use‑Cases

Troubleshooting Cases

Case 1 – DR Mode SYN Backlog Saturation

Case 2 – NAT Mode Real Server Unreachable

Case 3 – keepalived VRRP Flapping

Case 4 – TUN Mode Latency Spike

Monitoring and Metrics

Comparison with Other Load‑Balancing Solutions

Best‑Practice Checklist

Conclusion

MaGe Linux Operations

How this landed with the community

Was this worth your time?

0 Comments

Environment Preparation (CentOS 7 / RHEL 7 example)

Case 1 – DR Mode SYN Backlog Saturation

Case 2 – NAT Mode Real Server Unreachable

Case 3 – keepalived VRRP Flapping

Case 4 – TUN Mode Latency Spike