Operations 25 min read

Mastering SR‑IOV: From Basics to Advanced KVM Integration and Migration

This guide explains SR‑IOV fundamentals, PF/VF roles, VEB and VEPA virtual switches, multi‑channel extensions, and provides step‑by‑step instructions for enabling VFs, attaching them to KVM guests, configuring NUMA affinity, bonding, and handling hot‑migration challenges.

AI Cyberspace

Apr 21, 2023

Mastering SR‑IOV: From Basics to Advanced KVM Integration and Migration

SR‑IOV Overview

Traditional I/O virtualization relies on the VMM to capture and emulate VM I/O, becoming a performance bottleneck when many VMs share a device. Intel VT‑d hardware‑assisted I/O virtualization allows PCIe passthrough without VMM involvement, but physical devices are limited. SR‑IOV (Single‑Root I/O Virtualization) was introduced to virtualize a PCIe device into multiple virtual functions (VFs) specifically for VMs.

PF and VF

PF (Physical Function) : Manages resources and VF lifecycle, allocating MAC addresses and queues. The OS or VMM configures VFs through the PF.

VF (Virtual Function) : Lightweight virtual channel containing only I/O functions, with its own PCIe configuration space, MAC address, and queues, sharing the PF's physical resources.

SR‑IOV requires two core features:

BAR address mapping : Maps VF PCIe BAR to PF PCIe BAR for resource access.

Virtual I/O queues : Maps VF I/O requests to shared or dedicated queues in the PF.

By default, VFs are disabled; enabling them creates virtual PCIe configuration spaces accessed via registers.

SR‑IOV VEB

Virtual Ethernet Bridge (VEB) provides hardware‑implemented Layer‑2 switching. The PF manages VEB configuration, connecting PF and all VFs; forwarding is based on MAC and VLAN IDs.

Ingress : Frames matching a VF’s MAC/VLAN are delivered to that VF; otherwise they go to the PF or broadcast.

Egress : Frames whose MAC does not match any port are sent out; broadcasts are forwarded within the VLAN.

SR‑IOV VEPA

VEPA (Virtual Ethernet Port Aggregator) addresses limitations of VEB such as lack of traffic visibility and control. Two main issues arise: intra‑host traffic bypasses monitoring points, and outbound traffic lacks identifiable tags.

Solutions include forcing VM traffic through a collection point and adding identifiable tags. Two major approaches exist:

Cisco/VMware promote VN‑Tag (802.1Qbh BPE) requiring new hardware.

HP, Juniper, IBM, Qlogic, Brocade promote VEPA (802.1Qbg EVB) using existing equipment at lower cost.

VEPA forces VM traffic to the upstream TOR switch, enabling hairpin forwarding for intra‑host communication.

SR‑IOV Multi‑Channel

Based on QinQ (802.1ad) and the S‑TAG, Multi‑Channel extends VEPA to support multiple logical channels (VEB, VEPA, Director IO) on a single NIC, allowing flexible deployment according to security, performance, and manageability requirements.

Each logical channel is isolated and identified by an additional S‑TAG and VLAN‑ID.

SR‑IOV OvS

SR‑IOV lacks a native SDN control plane, requiring extra components (e.g., Neutron sriov‑agent). With SmartNICs and DPUs, SR‑IOV can be combined with OVS Fastpath, using VFs as virtual channels for programmable, high‑performance networking.

SR‑IOV Practical Usage

Enable SR‑IOV VFs

Step 1 . Ensure SR‑IOV and VT‑d are enabled in BIOS.

Step 2 . Enable I/O MMU in Linux (e.g., add intel_iommu=on to kernel parameters).

...<br/>linux16 /boot/vmlinuz-3.10.0-862.11.6.rt56.819.el7.x86_64 root=LABEL=img-rootfs ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet intel_iommu=on iommu=pt isolcpus=2-3,8-9 nohz=on nohz_full=2-3,8-9 rcu_nocbs=2-3,8-9 intel_pstate=disable nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=16 LANG=en_US.UTF-8<br/>...

Step 3 . Create VFs via the PCI SYS interface.

cat /etc/sysconfig/network-scripts/ifcfg-enp129s0f0<br/>DEVICE="enp129s0f0"<br/>BOOTPROTO="dhcp"<br/>ONBOOT="yes"<br/>TYPE="Ethernet"<br/><br/>cat /etc/sysconfig/network-scripts/ifcfg-enp129s0f1<br/>DEVICE="enp129s0f1"<br/>BOOTPROTO="dhcp"<br/>ONBOOT="yes"<br/>TYPE="Ethernet"<br/><br/>echo 16 > /sys/class/net/enp129s0f0/device/sriov_numvfs<br/>echo 16 > /sys/class/net/enp129s0f1/device/sriov_numvfs

Step 4 . Verify VFs are created and up.

lspci | grep Ethernet<br/>03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)<br/>...<br/>81:10.7 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

Step 5 . Persist VF creation on reboot.

echo "echo '7' > /sys/class/net/eth3/device/sriov_numvfs" >> /etc/rc.local

Attach VF to a KVM VM

Use -device vfio-pci,host=<vf pci bus addr> in the QEMU command line.

qemu-system-x86_64 -enable-kvm -drive file=<vm img>,if=virtio -cpu host -smp 16 -m 16G \
  -name <vm name> -device vfio-pci,host=<vf1> -device vfio-pci,host=<vf2> -vnc :1 -net none

Alternatively, attach via libvirt XML:

<interface type='hostdev' managed='yes'>
  <source>
    <address type='pci' domain='0x0000' bus='0x81' slot='0x10' function='0x2'/>
  </source>
</interface>

Attach live:

virsh attach-device VM1 /tmp/new-device.xml --live --config

NUMA Affinity

Check NUMA node of the NIC:

cat /sys/class/net/enp129s0f0/device/numa_node<br/>1

Check VM vCPU pinning:

virsh vcpupin VM1_uuid

VF Network Configuration

Configure MAC, VLAN, and promiscuous mode per VF:

ip l | grep 5e:9c<br/>    vf 14 MAC fa:16:3e:90:5e:9c, vlan 19, spoof checking on, link-state auto, trust on, query_rss off

VLAN ID appears in the VM’s XML:

<interface type='hostdev' managed='yes'>
  <mac address='fa:aa:aa:aa:aa:aa'/>
  <driver name='kvm'/>
  <source>
    <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x7'/>
  </source>
  <vlan>
    <tag id='190'/>
  </vlan>
</interface>

VF Bonding

Bond two VFs (from separate PFs) inside a VM by configuring identical MAC addresses and using standard Linux bonding scripts.

BONDING_MASTER=yes<br/>BOOTPROTO=none<br/>DEVICE=bond0<br/>ONBOOT=yes<br/>TYPE=Bond<br/><br/>DEVICE=ens4<br/>MASTER=bond0<br/>ONBOOT=yes<br/>SLAVE=yes<br/>TYPE=Ethernet<br/><br/>DEVICE=ens5<br/>MASTER=bond0<br/>ONBOOT=yes<br/>SLAVE=yes<br/>TYPE=Ethernet

SR‑IOV VM Hot‑Migration Issues

Passthrough VFs limits VM migration because the address translation tables (GVA↔HPA) are lost during migration. After migration, the VM must manually bring the interface up, causing network interruption.

Workaround: add a normal (OvS) or indirect (macvtap) port, bond it with the SR‑IOV port inside the guest, migrate, then bring the SR‑IOV port up and remove the extra port.