Unlocking High‑Performance VM Networking: A Deep Dive into vhost/virtio and QEMU Backend Design
This article explains why virtual machine communication becomes a bottleneck in cloud and big‑data environments, introduces the vhost/virtio acceleration technique, and provides a step‑by‑step guide to hand‑crafting vhost/virtio, configuring QEMU's VIRTIO backend, and understanding the vhost‑user protocol and vhost‑net implementation.
1. Handwritten Vhost/Virtio
In cloud computing, when many VMs transfer large amounts of data simultaneously, the communication bottleneck becomes evident, severely affecting cloud service performance and user experience. The same problem appears in big‑data analysis where inefficient virtual device communication slows down data processing.
Is there a way to break this communication deadlock? The answer is the vhost/virtio technology, which acts like a communication accelerator for virtual devices.
1.1 Preparation: Setting up the "stage"
Before starting the handwritten vhost/virtio journey, we need a proper development environment, similar to building a stable stage for a performance.
We must install QEMU, the foundation of virtualization. On Ubuntu, the easiest way is sudo apt-get install qemu-system-x86. For the latest features, download the source from the official website and compile it.
In addition to QEMU, we need development tools and libraries such as GCC, libvirt development libraries, and libpciaccess. Pay attention to version compatibility to avoid mismatched interfaces.
1.2 Exploring Key Data Structures
After the environment is ready, we must understand the key data structures of vhost/virtio. The virtio_ring queue consists of a descriptor table, an available ring, and a used ring. The descriptor table holds the address, length, and flags of each data buffer, similar to a cargo manifest.
The available ring is a list of descriptors that the Guest makes available to the Host, while the used ring notifies the Guest which buffers have been processed.
In network communication, the Guest fills a descriptor with packet information, adds its index to the available ring, and the Host reads the descriptor, sends the packet, and places the index into the used ring.
1.3 Core Code Implementation: Building the "communication bridge"
① Create shared memory
Shared memory is the basis of efficient vhost/virtio communication. It works like a common warehouse that both Guest and Host can access directly, avoiding multiple copies.
On Linux we can use shmget to create a shared memory segment:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#define SHM_SIZE 1024 * 1024 // 1 MiB
int main() {
key_t key;
int shmid;
key = ftok(".", 'a');
if (key == -1) { perror("ftok"); return 1; }
shmid = shmget(key, SHM_SIZE, IPC_CREAT | 0666);
if (shmid == -1) { perror("shmget"); return 1; }
printf("Shared memory created, id: %d
", shmid);
if (shmctl(shmid, IPC_RMID, NULL) == -1) { perror("shmctl"); return 1; }
return 0;
}After creation, the memory must be mapped into the process address space with shmat. Synchronisation (e.g., semaphores) is required to ensure safe concurrent access.
① Initialise Virtio Ring
Initialising the Virtio Ring sets up the descriptor, available and used structures:
#include <stdint.h>
#define QUEUE_SIZE 256
typedef struct {
uint64_t addr;
uint32_t len;
uint16_t flags;
uint16_t next;
} virtio_desc;
typedef struct {
uint16_t flags;
uint16_t idx;
uint16_t ring[QUEUE_SIZE];
} virtio_avail;
typedef struct {
uint16_t flags;
uint16_t idx;
struct { uint32_t id; uint32_t len; } ring[QUEUE_SIZE];
} virtio_used;
typedef struct {
virtio_desc *desc;
virtio_avail *avail;
virtio_used *used;
} virtio_ring;
void init_virtio_ring(virtio_ring *ring) {
ring->desc = (virtio_desc *)malloc(QUEUE_SIZE * sizeof(virtio_desc));
if (!ring->desc) return;
ring->avail = (virtio_avail *)malloc(sizeof(virtio_avail));
if (!ring->avail) { free(ring->desc); return; }
ring->used = (virtio_used *)malloc(sizeof(virtio_used));
if (!ring->used) { free(ring->desc); free(ring->avail); return; }
for (int i = 0; i < QUEUE_SIZE; i++) {
ring->desc[i].addr = 0;
ring->desc[i].len = 0;
ring->desc[i].flags = 0;
ring->desc[i].next = i + 1;
}
ring->desc[QUEUE_SIZE - 1].next = 0;
ring->avail->flags = 0;
ring->avail->idx = 0;
ring->used->flags = 0;
ring->used->idx = 0;
}② Data Send/Receive Handling
When the Guest wants to send data, it fills the descriptor, adds the index to the available ring, and notifies the Host:
void guest_send_data(virtio_ring *ring, const void *data, size_t len) {
uint16_t desc_idx = ring->avail->idx;
virtio_desc *desc = &ring->desc[desc_idx];
desc->addr = (uint64_t)data;
desc->len = len;
desc->flags = 0;
ring->avail->ring[ring->avail->idx % QUEUE_SIZE] = desc_idx;
ring->avail->idx++;
// notify Host (e.g., via interrupt or eventfd)
}The Host reads the descriptor from the available ring, processes the packet, and places the index into the used ring:
void host_receive_data(virtio_ring *ring) {
uint16_t used_idx = ring->used->idx;
while (used_idx < ring->avail->idx) {
uint16_t desc_idx = ring->avail->ring[used_idx % QUEUE_SIZE];
virtio_desc *desc = &ring->desc[desc_idx];
// process data from shared memory
ring->used->ring[ring->used->idx % QUEUE_SIZE].id = desc_idx;
ring->used->ring[ring->used->idx % QUEUE_SIZE].len = desc->len;
ring->used->idx++;
used_idx++;
}
// notify Guest (e.g., via interrupt or eventfd)
}③ Interrupt Handling Mechanism
Interrupts act like traffic lights on the bridge, informing the other side of important events. The Guest creates an eventfd and writes a value to it to trigger a Host interrupt:
#include <sys/eventfd.h>
#include <unistd.h>
void guest_notify_host(int eventfd) {
uint64_t value = 1;
if (write(eventfd, &value, sizeof(value)) == -1) {
perror("write eventfd");
}
}The Host polls the eventfd and calls the data‑processing routine when a notification arrives:
#include <sys/eventfd.h>
#include <poll.h>
void host_listen_interrupt(int eventfd, virtio_ring *ring) {
struct pollfd fds[1];
fds[0].fd = eventfd;
fds[0].events = POLLIN;
while (1) {
int ret = poll(fds, 1, -1);
if (ret == -1) { perror("poll"); break; }
if (ret > 0 && (fds[0].revents & POLLIN)) {
uint64_t v;
if (read(eventfd, &v, sizeof(v)) == -1) { perror("read eventfd"); continue; }
host_receive_data(ring);
}
}
}2. QEMU Backend Driver
The VIRTIO device front‑end lives in the Guest kernel driver; the back‑end is implemented by QEMU or a DPU. Both the control plane (feature negotiation, configuration) and the data plane (packet transfer) are designed independently.
QEMU implements the control plane, while the data plane is handed over to the VHOST framework. VHOST has two paths: user‑mode vhost‑user and kernel‑mode vhost‑kernel. This article focuses on the vhost‑user path using a net device as an example.
2.1 VIRTIO Device Creation Process
From a command‑line example, we can see how a device is created:
gdb --args ./x86_64-softmmu/qemu-system-x86_64 \
-machine accel=kvm -cpu host -smp sockets=2,cores=2,threads=1 -m 3072M \
-object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \
-hda /home/kvm/disk/vm0.img -mem-prealloc -numa node,memdev=mem \
-vnc 0.0.0.0:00 -monitor stdio --enable-kvm \
-netdev type=tap,id=eth0,ifname=tap30,script=no,downscript=no \
-device e1000,netdev=eth0,mac=12:03:04:05:06:08 \
-chardev socket,id=char1,path=/tmp/vhostsock0,server \
-netdev type=vhost-user,id=mynet3,chardev=char1,vhostforce,queues=$QNUM \
-device virtio-net-pci,netdev=mynet3,id=net1,mac=00:00:00:00:00:03,disable-legacy=onThe -device option creates a virtio‑net‑pci device, which depends on a QEMU netdev object, which in turn depends on a character device.
2.2 Netdev Command‑Line Parsing
QEMU parses command‑line options in main(), stores them locally, and processes netdev options with
qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, ...). For type=vhost‑user, net_init_vhost_user() is called, which matches the associated character device and creates the backend.
2.3 Device Instance Initialisation
When qdev_device_add() is invoked, it creates an Object, then calls the class’s instance_init (e.g., virtio_net_pci_instance_init) to allocate the VirtIONetPCI structure, which contains a VirtIOPCIProxy (PCI side) and a VirtIONet (VIRTIO side).
2.4 Realise Flow
After instance creation, the realize method of each class is called, ultimately invoking virtio_net_pci_realize, which triggers virtio_net_device_realize. This sets up the VIRTIO queues, registers BAR regions, and prepares the device for data‑plane operation.
3. QEMU and VHOST Interface Description
3.1 vhost‑user Netdev Creation
The command line -netdev type=vhost-user,... creates a netdev that uses a Unix socket for communication. QEMU parses the option, creates a NetClientState, and later establishes a socket connection.
3.2 QEMU ↔ VHOST Communication
Communication is performed over a Unix socket using a fixed header followed by a payload. The header contains the request type, flags, and payload size. Example request enum includes VHOST_USER_GET_FEATURES, VHOST_USER_SET_MEM_TABLE, VHOST_USER_SET_VRING_NUM, etc.
typedef struct {
VhostUserRequest request;
uint32_t flags;
uint32_t size; // payload size
} VhostUserHeader;
typedef union {
uint64_t u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
VhostUserMemory memory;
VhostUserLog log;
struct vhost_iotlb_msg iotlb;
VhostUserConfig config;
VhostUserCryptoSession session;
VhostUserVringArea area;
} VhostUserPayload;
typedef struct {
VhostUserHeader hdr;
VhostUserPayload payload;
} VhostUserMsg;Key VHOST operations include vhost_set_vring_num (queue size), vhost_set_vring_addr (GPA base address), vhost_set_vring_kick (eventfd for notifications), and vhost_set_mem_table (share guest memory with the backend).
4. vhost‑user Framework Design
4.1 vhost Initialisation Flow
DPDK’s rte_vhost_driver_register() creates a vhost_user_socket, opens the socket file, and stores the fd. rte_vhost_driver_callback_register() registers a vhost_device_ops table. rte_vhost_driver_start() launches a client or server thread, registers the socket with the event loop, and waits for messages.
4.2 QEMU ↔ vhost‑user Communication Process
After initialisation, the client registers the socket, creates a virtio_net instance, and adds a read callback. Incoming messages are dispatched via a vhost_message_handlers array based on the request type, populating the memory table, queue configuration, and eventfds.
5. Virtio‑net Device in QEMU
The virtio‑net device is a virtual Ethernet card that supports multiple TX/RX queues. Empty buffers are placed in the RX virtqueue, while outgoing packets are placed in the TX virtqueue. A third virtqueue handles control messages such as MAC address changes.
5.1 vhost Protocol
The vhost protocol offloads the data plane from QEMU to a more efficient handler (either kernel vhost‑net or user‑mode vhost‑user). The hypervisor provides the memory layout and a pair of eventfds (kick/call) for notifications. After the handshake, the handler directly accesses the virtqueues and notifies the Guest via irqfd.
5.2 vhost‑net Kernel Implementation
The vhost‑net kernel driver implements the handler side. QEMU opens /dev/vhost‑net, issues a series of ioctls to set up the memory table, queue sizes, and eventfds, and then a kernel thread ( vhost‑$pid) polls the eventfds. When the Guest writes to a PCI MMIO address, an ioeventfd triggers the kernel thread, which processes the packet without involving the QEMU process.
5.3 Using virtio‑net in QEMU
To create a virtio‑net device, add -device virtio‑net,netdev=net0 or -netdev user,id=net0 -device virtio‑net,netdev=net0 to the QEMU command line. Inside the Guest, the device appears as a normal NIC and the driver loads automatically.
Example command line:
qemu-system-x86_64 -netdev user,id=net0 -device virtio-net,netdev=net0Example bash script:
#!/bin/bash
qemu-system-x86_64 \
-netdev user,id=net0 \
-device virtio-net,netdev=net0 \
[other QEMU options]For high‑performance scenarios, replace the user backend with vhost‑user and connect it to an OVS DPDK port:
ovs-vsctl add-port br0 vhost-client-1 \
-- set Interface vhost-client-1 type=dpdkvhostuserclient \
options:vhost-server-path=$VHOST_USER_SOCKET_PATHSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
