Fundamentals 40 min read

Understanding vhost/virtio: Handwritten Implementation and QEMU Backend Integration

This article explains how vhost/virtio solves virtual device communication bottlenecks, walks through a hand‑written implementation with shared memory and ring buffers, and details QEMU's backend driver creation, device realization, and the vhost‑user/vhost‑net interfaces that enable high‑performance virtual networking.

Deepin Linux
Deepin Linux
Deepin Linux
Understanding vhost/virtio: Handwritten Implementation and QEMU Backend Integration

1. Handwritten Vhost/Virtio

In cloud environments, simultaneous large data transfers among VMs cause communication bottlenecks that degrade performance. vhost/virtio acts as a communication accelerator to alleviate this.

1.1 Preparation: Building the “stage”

Install QEMU (e.g., via apt or from source) and required development tools such as GCC, libvirt, and libpciaccess, ensuring compatible versions.

1.2 Key data structures

The core of virtio communication is the virtio_ring , composed of a descriptor table, an available ring, and a used ring. Descriptors hold buffer address, length, and flags; the available ring lists descriptors ready for the host, and the used ring reports processed buffers.

1.3 Core code: Building the communication bridge

① Create shared memory

Shared memory provides a common area accessible by both Guest and Host, avoiding copies.

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>

#define SHM_SIZE 1024 * 1024 // 1 MiB

int main() {
    key_t key;
    int shmid;
    key = ftok(".", 'a');
    if (key == -1) { perror("ftok"); return 1; }
    shmid = shmget(key, SHM_SIZE, IPC_CREAT | 0666);
    if (shmid == -1) { perror("shmget"); return 1; }
    printf("Shared memory created, id: %d\n", shmid);
    // ... use shared memory ...
    if (shmctl(shmid, IPC_RMID, NULL) == -1) { perror("shmctl"); return 1; }
    return 0;
}

After creating the segment, map it into the process address space with shmat and synchronize access.

② Initialize Virtio Ring

#include <stdint.h>

#define QUEUE_SIZE 256

typedef struct {
    uint64_t addr;
    uint32_t len;
    uint16_t flags;
    uint16_t next;
} virtio_desc;

typedef struct {
    uint16_t flags;
    uint16_t idx;
    uint16_t ring[QUEUE_SIZE];
} virtio_avail;

typedef struct {
    uint16_t flags;
    uint16_t idx;
    struct { uint32_t id; uint32_t len; } ring[QUEUE_SIZE];
} virtio_used;

typedef struct {
    virtio_desc *desc;
    virtio_avail *avail;
    virtio_used *used;
} virtio_ring;

void init_virtio_ring(virtio_ring *ring) {
    ring->desc = (virtio_desc *)malloc(QUEUE_SIZE * sizeof(virtio_desc));
    if (!ring->desc) return;
    ring->avail = (virtio_avail *)malloc(sizeof(virtio_avail));
    if (!ring->avail) { free(ring->desc); return; }
    ring->used = (virtio_used *)malloc(sizeof(virtio_used));
    if (!ring->used) { free(ring->desc); free(ring->avail); return; }
    for (int i = 0; i < QUEUE_SIZE; i++) {
        ring->desc[i].addr = 0;
        ring->desc[i].len = 0;
        ring->desc[i].flags = 0;
        ring->desc[i].next = i + 1;
    }
    ring->desc[QUEUE_SIZE - 1].next = 0;
    ring->avail->flags = 0;
    ring->avail->idx = 0;
    ring->used->flags = 0;
    ring->used->idx = 0;
}

Memory for descriptors, available and used rings is allocated and fields are zero‑initialized.

③ Data send/receive

void guest_send_data(virtio_ring *ring, const void *data, size_t len) {
    uint16_t desc_idx = ring->avail->idx;
    virtio_desc *desc = &ring->desc[desc_idx];
    desc->addr = (uint64_t)data;
    desc->len = len;
    desc->flags = 0;
    ring->avail->ring[ring->avail->idx % QUEUE_SIZE] = desc_idx;
    ring->avail->idx++;
    // notify host (e.g., via eventfd)
}

Guest fills a descriptor, pushes its index to the available ring, and notifies the host.

void host_receive_data(virtio_ring *ring) {
    uint16_t used_idx = ring->used->idx;
    while (used_idx < ring->avail->idx) {
        uint16_t desc_idx = ring->avail->ring[used_idx % QUEUE_SIZE];
        virtio_desc *desc = &ring->desc[desc_idx];
        // process data from shared memory
        ring->used->ring[ring->used->idx % QUEUE_SIZE].id = desc_idx;
        ring->used->ring[ring->used->idx % QUEUE_SIZE].len = desc->len;
        ring->used->idx++;
        used_idx++;
    }
    // notify guest (e.g., via eventfd)
}

Host reads descriptors from the available ring, processes the buffers, and places indices into the used ring.

④ Interrupt handling

#include <sys/eventfd.h>
#include <unistd.h>

void guest_notify_host(int eventfd) {
    uint64_t value = 1;
    if (write(eventfd, &value, sizeof(value)) == -1) {
        perror("write eventfd");
    }
}

Guest writes to an eventfd to trigger a host interrupt; the host polls the eventfd and calls host_receive_data .

2. QEMU Backend Driver

The front‑end of a VIRTIO device runs in the Guest kernel, while the back‑end is implemented by QEMU or a DPU. The control plane negotiates features and configures the device; the data plane is handed off to the VHOST framework (user‑mode vhost‑user or kernel‑mode vhost‑kernel).

2.1 VIRTIO device creation flow

Example command line creates a virtio‑net‑pci device:

gdb --args ./x86_64-softmmu/qemu-system-x86_64 \
    -machine accel=kvm -cpu host -smp sockets=2,cores=2,threads=1 -m 3072M \
    -object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \
    -hda /home/kvm/disk/vm0.img -mem-prealloc -numa node,memdev=mem \
    -vnc 0.0.0.0:00 -monitor stdio --enable-kvm \
    -netdev type=tap,id=eth0,ifname=tap30,script=no,downscript=no \
    -device e1000,netdev=eth0,mac=12:03:04:05:06:08 \
    -chardev socket,id=char1,path=/tmp/vhostsock0,server \
    -netdev type=vhost-user,id=mynet3,chardev=char1,vhostforce,queues=$QNUM \
    -device virtio-net-pci,netdev=mynet3,id=net1,mac=00:00:00:00:00:03,disable-legacy=on

QEMU parses options, creates a netdev of type vhost-user , associates it with a character device, and finally instantiates the PCI device via qdev_device_add . The device class VirtIONetPCI contains a VirtIOPCIProxy and a VirtIONet instance.

2.2 Device realization

During realize , QEMU initializes PCI BARs, registers memory‑region I/O operations, and calls the virtio‑net realize routine, which eventually invokes vhost_net_start once the guest writes VIRTIO_CONFIG_S_DRIVER_OK .

3. QEMU ↔ VHOST Interface

3.1 vhost‑user netdev creation

The -netdev type=vhost-user option creates a NetClientState linked to a UNIX socket. QEMU registers callbacks that translate virtio configuration writes into VHOST messages.

3.2 Communication protocol

Messages consist of a VhostUserHeader (request, flags, size) followed by a payload union. Requests include VHOST_USER_GET_FEATURES , VHOST_USER_SET_MEM_TABLE , VHOST_USER_SET_VRING_ADDR , etc. The host maps guest memory regions via mmap using file descriptors supplied in the memory‑table.

When the guest sets VIRTIO_CONFIG_S_DRIVER_OK , QEMU sends VHOST_USER_SET_VRING_* messages, enabling the back‑end to start processing packets.

4. VHOST‑USER Framework Design

4.1 Initialization

DPDK’s rte_vhost_driver_register opens the socket, rte_vhost_driver_callback_register stores the vhost_device_ops , and rte_vhost_driver_start creates a connection and registers a read callback.

4.2 Message handling

Each VHOST request is dispatched to a handler in vhost_message_handlers , which records configuration (features, memory table, vring addresses) for later data‑plane use.

5. Virtio‑net Device in QEMU

Virtio‑net provides multi‑queue TX/RX, offloads checksum, GSO/GRO, and interacts with a TAP device or vhost‑net kernel driver. The data path can be offloaded via the VHOST protocol to improve performance.

5.1 VHOST protocol

The protocol conveys memory layout and event‑fd pairs (kick/call) so the handler can directly read/write guest buffers and generate interrupts without hypervisor mediation.

5.2 vhost‑net kernel driver

vhost‑net creates /dev/vhost-net , spawns a kernel thread, and uses ioeventfd and irqfd to bypass QEMU for virtqueue notifications, achieving near‑native packet throughput.

Overall, the article walks through building a minimal vhost/virtio implementation, explains QEMU’s device creation and realization flow, and details how the vhost‑user and vhost‑net back‑ends enable high‑performance virtual networking.

kernelnetworkingVirtualizationQEMUvhostvirtiovhost-user
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.