Operations 21 min read

How to Build a Pure Three‑Tier Server Access Network Without Overlay

This article examines the evolution of data‑center server access networks, explains why traditional large Layer‑2 designs are problematic at scale, and presents a pure three‑tier underlay solution that uses host routing, ECMP, and ARP proxy to achieve seamless KVM communication without overlay overhead.

Efficient Ops
Efficient Ops
Efficient Ops
How to Build a Pure Three‑Tier Server Access Network Without Overlay

Background

Traditional data‑center architectures placed the gateway on the switch side, but modern designs push the gateway down and adopt a three‑tier interconnect between access and core layers. This article explores the evolution of server access networks and proposes a pure three‑tier underlay solution.

Terminology

Common terms are illustrated in the diagram below.

Evolution of Server Access Network Solutions

Rise of the Big Layer‑2

The big Layer‑2 approach, exemplified by Cisco Nexus 752, simplified management but revealed drawbacks as data‑center scale grew: massive broadcast domains, high density of IP addresses, vendor lock‑in, and increasing operational complexity (STP, broadcast storms, loop issues). These factors motivated a shift toward three‑tier designs.

Emergence of Three‑Tier Architecture

Overlay technologies (VXLAN, GRE) enabled VM migration independent of physical networks, allowing the underlay to focus on high‑bandwidth, large‑scale communication while the overlay handled virtual networks. The three‑tier model offers several advantages:

Reduces broadcast domains, limiting storms and loops to individual racks.

Lessens vendor dependence; mixed‑vendor switches can interconnect using stacking, M‑LAG, or VPC.

Simplifies cut‑over operations by focusing on routing paths.

Eliminates expensive chassis core switches by using ECMP and CLOS with commodity devices.

Company‑Specific Requirements

The target environment (referred to as "Micro‑Store") prioritizes low internal latency, avoids overlay overhead, and needs IP address continuity for KVM migration across three‑tier links. Additional constraints include uniform switch models per host, seamless online replacement of access switches, complete isolation of broadcast domains, and a simple overall design.

Proposed Solution

Running a routing protocol on each host, discarding bonding, and employing ECMP satisfies the first three requirements. By assigning each host a dedicated /32 subnet and leveraging ARP proxy, the need for IP address migration is mitigated. The core idea is to use ARP proxy to answer broadcast ARP requests and host‑specific /32 routes to forward traffic precisely.

Key Technical Points

ARP Proxy

Enabling proxy_arp on the gateway makes it respond to ARP requests, causing VMs to send traffic to the gateway, which then forwards based on routing tables.

<code>medium_id - INTEGER
    Integer value used to differentiate the devices by the medium they
    are attached to. ...
proxy_arp - BOOLEAN
    Do proxy arp.
    proxy_arp for the interface will be enabled if at least one of
    conf/{all,interface}/proxy_arp is set to TRUE,
    it will be disabled otherwise</code>

Host‑Specific /32 Routes

Injecting /32 host routes (or redistributing them) ensures that traffic destined for a VM on another host is routed directly, avoiding broadcast limitations. In a single MDU, a few thousand host routes are manageable; they can be aggregated at the access switch level.

Design Overview

The solution consists of host network access, VM network access, and host parameter tuning.

Host Interface Planning

Host Network Access

Access switches provide /30 links to each server.

Hosts run FRR with OSPF and BFD for fast failover.

Loopback interfaces hold /32 management IPs.

FRR example configuration:

<code>bfd
 peer 10.47.175.9 local-address 10.47.175.10  # 50ms heartbeat, 4‑fail shutdown
  detect-multiplier 4
  receive-interval 50
  transmit-interval 50
  no shutdown

interface eth0
 ip ospf bfd

router ospf
 redistribute kernel
 ospf router-id 10.47.176.132
 network 10.47.175.10/32 area 0.0.0.0
 network 10.47.175.14/32 area 0.0.0.0
 network 10.47.176.132/32 area 0.0.0.0</code>

Access‑switch OSPF example:

<code>interface Ethernet1/4
  bfd interval 50 min_rx 50 multiplier 4
  no bfd echo

router ospf 1
  bfd
  router-id 172.20.225.254
  passive-interface default

interface Ethernet1/4
  no ip ospf passive-interface
  ip router ospf 1 area 0.0.0.0</code>

KVM Automatic Static Route Loading

The script

/etc/qemu-ifup

receives the virtual NIC name (e.g., vnet0) and derives the VM's IP from its MAC address, then adds a /32 route on the bridge.

<code>#!/bin/sh

bridge=br-l2

mac=(`awk -F":" '{print $3,$4,$5,$6}' /sys/class/net/$1/address`)

for i in ${mac[*]}
do
    x=`expr $x + 1`
    ip_part[$x]=$(( 16#$i ))
 done

ip=${ip_part[1]}.${ip_part[2]}.${ip_part[3]}.${ip_part[4]}

ip route add $ip/32 dev $bridge</code>

VM Network Access

Each VM subnet is attached to a Linux bridge.

The bridge gets an IP for VM gateway; MAC is not constrained.

VM XML references the bridge for network connectivity.

Bridge configuration example:

<code>brctl addbr br-l2
ip addr add dev br-l2 10.0.0.1/24
ip link set dev br-l2 up</code>

VM interface example:

<code>&lt;interface type='bridge'&gt;
      &lt;mac address='02:42:c1:25:0f:ba'/&gt;
      &lt;source bridge='br-l2'/&gt;
      &lt;model type='virtio'/&gt;
      &lt;driver name='vhost' queues='2'/&gt;
      &lt;address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/&gt;
&lt;/interface&gt;</code>

Host Parameter Tuning

Enable IP forwarding globally and on bridges.

Activate proxy_arp on the bridge.

Reduce proxy_arp delay to improve first‑packet latency.

<code>net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.br-l2.proxy_arp = 1
net.ipv4.neigh.br-l2.proxy_delay = 1</code>

Network Change Considerations

Device Restart, Re‑cabling, Upgrade

Shut down OSPF/BGP neighbors, verify dual uplinks, perform changes, then restore neighbors. Risk is minimal.

Device Replacement

Test configuration in a lab, then replace hardware after traffic is safely diverted.

Cross‑Cabinet Online Migration

Deploy access switches and hosts in the target cabinet, keep the same gateway, and use /32 host routes to preserve IP continuity during migration.

Full‑Data‑Center Migration

Aggregate VM subnets per MDU, propagate summarized routes across the backbone, and migrate cabinets sequentially while maintaining route consistency.

BGPKVMdata center networkingECMPARP proxythree-tier architecture
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.