How to Build a Pure Three‑Tier Server Access Network Without Overlay
This article examines the evolution of data‑center server access networks, explains why traditional large Layer‑2 designs are problematic at scale, and presents a pure three‑tier underlay solution that uses host routing, ECMP, and ARP proxy to achieve seamless KVM communication without overlay overhead.
Background
Traditional data‑center architectures placed the gateway on the switch side, but modern designs push the gateway down and adopt a three‑tier interconnect between access and core layers. This article explores the evolution of server access networks and proposes a pure three‑tier underlay solution.
Terminology
Common terms are illustrated in the diagram below.
Evolution of Server Access Network Solutions
Rise of the Big Layer‑2
The big Layer‑2 approach, exemplified by Cisco Nexus 752, simplified management but revealed drawbacks as data‑center scale grew: massive broadcast domains, high density of IP addresses, vendor lock‑in, and increasing operational complexity (STP, broadcast storms, loop issues). These factors motivated a shift toward three‑tier designs.
Emergence of Three‑Tier Architecture
Overlay technologies (VXLAN, GRE) enabled VM migration independent of physical networks, allowing the underlay to focus on high‑bandwidth, large‑scale communication while the overlay handled virtual networks. The three‑tier model offers several advantages:
Reduces broadcast domains, limiting storms and loops to individual racks.
Lessens vendor dependence; mixed‑vendor switches can interconnect using stacking, M‑LAG, or VPC.
Simplifies cut‑over operations by focusing on routing paths.
Eliminates expensive chassis core switches by using ECMP and CLOS with commodity devices.
Company‑Specific Requirements
The target environment (referred to as "Micro‑Store") prioritizes low internal latency, avoids overlay overhead, and needs IP address continuity for KVM migration across three‑tier links. Additional constraints include uniform switch models per host, seamless online replacement of access switches, complete isolation of broadcast domains, and a simple overall design.
Proposed Solution
Running a routing protocol on each host, discarding bonding, and employing ECMP satisfies the first three requirements. By assigning each host a dedicated /32 subnet and leveraging ARP proxy, the need for IP address migration is mitigated. The core idea is to use ARP proxy to answer broadcast ARP requests and host‑specific /32 routes to forward traffic precisely.
Key Technical Points
ARP Proxy
Enabling proxy_arp on the gateway makes it respond to ARP requests, causing VMs to send traffic to the gateway, which then forwards based on routing tables.
<code>medium_id - INTEGER
Integer value used to differentiate the devices by the medium they
are attached to. ...
proxy_arp - BOOLEAN
Do proxy arp.
proxy_arp for the interface will be enabled if at least one of
conf/{all,interface}/proxy_arp is set to TRUE,
it will be disabled otherwise</code>Host‑Specific /32 Routes
Injecting /32 host routes (or redistributing them) ensures that traffic destined for a VM on another host is routed directly, avoiding broadcast limitations. In a single MDU, a few thousand host routes are manageable; they can be aggregated at the access switch level.
Design Overview
The solution consists of host network access, VM network access, and host parameter tuning.
Host Interface Planning
Host Network Access
Access switches provide /30 links to each server.
Hosts run FRR with OSPF and BFD for fast failover.
Loopback interfaces hold /32 management IPs.
FRR example configuration:
<code>bfd
peer 10.47.175.9 local-address 10.47.175.10 # 50ms heartbeat, 4‑fail shutdown
detect-multiplier 4
receive-interval 50
transmit-interval 50
no shutdown
interface eth0
ip ospf bfd
router ospf
redistribute kernel
ospf router-id 10.47.176.132
network 10.47.175.10/32 area 0.0.0.0
network 10.47.175.14/32 area 0.0.0.0
network 10.47.176.132/32 area 0.0.0.0</code>Access‑switch OSPF example:
<code>interface Ethernet1/4
bfd interval 50 min_rx 50 multiplier 4
no bfd echo
router ospf 1
bfd
router-id 172.20.225.254
passive-interface default
interface Ethernet1/4
no ip ospf passive-interface
ip router ospf 1 area 0.0.0.0</code>KVM Automatic Static Route Loading
The script
/etc/qemu-ifupreceives the virtual NIC name (e.g., vnet0) and derives the VM's IP from its MAC address, then adds a /32 route on the bridge.
<code>#!/bin/sh
bridge=br-l2
mac=(`awk -F":" '{print $3,$4,$5,$6}' /sys/class/net/$1/address`)
for i in ${mac[*]}
do
x=`expr $x + 1`
ip_part[$x]=$(( 16#$i ))
done
ip=${ip_part[1]}.${ip_part[2]}.${ip_part[3]}.${ip_part[4]}
ip route add $ip/32 dev $bridge</code>VM Network Access
Each VM subnet is attached to a Linux bridge.
The bridge gets an IP for VM gateway; MAC is not constrained.
VM XML references the bridge for network connectivity.
Bridge configuration example:
<code>brctl addbr br-l2
ip addr add dev br-l2 10.0.0.1/24
ip link set dev br-l2 up</code>VM interface example:
<code><interface type='bridge'>
<mac address='02:42:c1:25:0f:ba'/>
<source bridge='br-l2'/>
<model type='virtio'/>
<driver name='vhost' queues='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface></code>Host Parameter Tuning
Enable IP forwarding globally and on bridges.
Activate proxy_arp on the bridge.
Reduce proxy_arp delay to improve first‑packet latency.
<code>net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.br-l2.proxy_arp = 1
net.ipv4.neigh.br-l2.proxy_delay = 1</code>Network Change Considerations
Device Restart, Re‑cabling, Upgrade
Shut down OSPF/BGP neighbors, verify dual uplinks, perform changes, then restore neighbors. Risk is minimal.
Device Replacement
Test configuration in a lab, then replace hardware after traffic is safely diverted.
Cross‑Cabinet Online Migration
Deploy access switches and hosts in the target cabinet, keep the same gateway, and use /32 host routes to preserve IP continuity during migration.
Full‑Data‑Center Migration
Aggregate VM subnets per MDU, propagate summarized routes across the backbone, and migrate cabinets sequentially while maintaining route consistency.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.