Fundamentals 43 min read

How Virtio Delivers Near‑Native I/O Performance in Linux Virtual Machines

This article explains the architecture, data‑flow mechanisms, and core APIs of Virtio, the semi‑virtualized I/O framework that powers Linux I/O virtualization in KVM/QEMU, detailing ring buffers, virtqueues, driver initialization, and performance optimizations for near‑native throughput.

Deepin Linux
Deepin Linux
Deepin Linux
How Virtio Delivers Near‑Native I/O Performance in Linux Virtual Machines

When you transfer files smoothly inside a virtual machine, have you ever wondered who efficiently schedules the I/O resources? When a cloud platform handles millions of concurrent requests, what ensures stable data paths? The answer lies in a modest yet critical technology – virtio . As the "invisible engine" of Linux I/O virtualization, virtio uses a semi‑virtualized design to bridge virtual machines and physical devices, bypassing the performance penalties of traditional virtualization and achieving near‑native I/O speeds. It has become the standard core of KVM, QEMU and other mainstream hypervisors.

Part 1 – Overview of Linux I/O Virtualization

Virtualization abstracts physical hardware into logical resources, allowing multiple isolated virtual machines (VMs) to run on a single host. Linux I/O virtualization is crucial because it handles the communication between VMs and physical hardware, aiming to break I/O performance bottlenecks. Think of each VM as a busy factory that constantly needs raw materials (input data) and produces products (output data); Linux I/O virtualization optimizes the transport and handling of these materials to keep the factory running smoothly.

Traditional Linux I/O virtualization relies on QEMU emulation. When a guest driver issues an I/O request, the KVM module intercepts it, stores the request in a shared I/O page, and notifies the user‑space QEMU program. QEMU then simulates the hardware operation and writes the result back to the shared page before notifying KVM, which finally returns the result to the guest. While flexible, this path involves many VMEntry/VMExit transitions and data copies, similar to a relay race with frequent hand‑offs, resulting in high latency and low throughput—unsuitable for data‑intensive or real‑time workloads.

Part 2 – Virtio in Linux I/O Virtualization

2.1 What is Virtio?

Virtio is an I/O virtualization standard created by Rusty Russell for his lguest project. In a semi‑virtualized architecture it acts as a bridge between the guest OS and the hypervisor, providing a unified device model and interface for network adapters, block devices, PCI devices, balloon drivers, etc. By exposing a common abstraction, virtio enables different hypervisors (e.g., KVM) to implement uniform I/O virtualization.

Traditional full‑virtualization requires the hypervisor to intercept every I/O instruction, causing many VMEntry/VMExit switches and data copies. Semi‑virtualization lets the guest handle non‑essential instructions directly, while the hypervisor only virtualizes the necessary ones, dramatically reducing overhead.

Because many front‑end devices share similar logic (block, network, PCI, balloon), a generic framework and standard protocol—virtio—eliminate the need for per‑device interfaces and improve cross‑platform compatibility.

Compared with traditional QEMU emulation, virtio offers several advantages:

Provides a generic interface, greatly improving code reuse and portability across platforms.

Reduces VMEntry/VMExit frequency by using virtqueues and ring buffers, achieving I/O performance close to native.

2.2 Virtio Data‑Flow Interaction Mechanism

The vring uses two ring buffers to forward data, as shown below:

A vring consists of three parts: the descriptor array desc, the available ring, and the used ring. desc stores descriptors that point to buffers; the available ring tells the host which descriptors are ready, while the used ring tells the guest which descriptors have been processed.

Virtio uses virtqueue objects to implement I/O. For example, the virtio‑net driver creates two queues (receive and transmit), while virtio‑blk uses a single queue.

When a guest wants to send data, it calls virtqueue_add_buf to place the buffer into the virtqueue, then calls virtqueue_kick (which invokes virtqueue_notify) to write to a register and notify the host. The host retrieves the data with virtqueue_get_buf.

The buffer is a scatter‑gather array described by a desc structure:

When the guest writes data to the virtqueue, it fills the buffer pointed to by desc, updates the available ring, and notifies the host.

When the host receives the notification, it reads the buffer from the available ring, processes the data, updates the used ring, and notifies the guest.

2.2 Virtio Buffer Pool

The guest front‑end driver interacts with the hypervisor through a buffer pool. For I/O, the guest can provide multiple pools (e.g., one for read requests, two for response data). Internally this is represented as a scatter‑gather list where each entry contains an address and length.

2.4 Core API

Virtio links the guest driver and hypervisor driver via virtio_device and virtqueue. The virtqueue API consists of five functions; the first, add_buf, supplies a request to the hypervisor as a scatter‑gather list. After adding buffers, the guest calls kick to notify the host. The host later calls get_buf (or a callback) to retrieve completed buffers. Additional functions enable_cb and disable_cb control callback processing.

Buffers are opaque to the host; only the front‑end and back‑end understand their format.

Part 3 – Virtio Architecture Deep Dive

3.1 Overall Architecture Overview

Virtio’s architecture consists of four layers:

Front‑end driver runs inside the VM and implements device‑specific logic (block, network, PCI, balloon, console). It receives user‑space requests, packages them, writes to I/O ports, and notifies QEMU’s back‑end.

Back‑end driver runs in the host (QEMU) and interacts with real hardware. It parses the request, performs the operation (e.g., sending a packet), and notifies the front‑end via interrupts.

Virtio layer implements the virtual queue interface that connects front‑end and back‑end.

Virtio‑ring layer provides the ring‑buffer implementation (descriptor table, available ring, used ring) that enables batch processing of I/O requests.

The virtio‑ring layer acts as a “construction crew”, storing multiple front‑end requests in the available ring and allowing the back‑end to process them in bulk, which greatly reduces the number of VM exits.

3.2 Key Component Analysis

The virtual queue interface and ring buffer are analogous to a nervous system and circulatory system, ensuring efficient data transfer.

Each front‑end driver can create zero or more virtqueues. For example, virtio‑net uses two queues (receive and transmit) to avoid contention.

The ring buffer is divided into three parts: descriptor table, available ring, and used ring. Descriptors hold buffer addresses and lengths; the available ring lists descriptors ready for the host; the used ring lists descriptors that the host has finished processing.

When the guest wants to send data, it adds the buffer to the virtqueue, updates the available ring, and notifies the host. The host reads the descriptor, accesses the shared memory, processes the data, writes the result to the used ring, and notifies the guest.

3.3 Virtio Initialization

(1) Front‑end initialization

Virtio devices follow the Linux generic device model and appear on the virtio_bus, similar to PCI devices. Registration is performed in driver/virtio/virtio.c:

Device registration

int register_virtio_device(struct virtio_device *dev)
{
    dev->dev.bus = &virtio_bus;            // set bus type
    err = ida_simple_get(&virtio_index_ida, 0, 0, GFP_KERNEL); // allocate unique index
    dev->config->reset(dev);            // reset config
    err = device_register(&dev->dev);    // register device in the system
}

Driver registration

int register_virtio_driver(struct virtio_driver *driver)
{
    driver->driver.bus = &virtio_bus;    // set bus type
    driver_register(&driver->driver);      // register driver
}

Device matching and probing

virtio_bus.match = virtio_dev_match;
virtio_bus.probe = virtio_dev_probe;

The probe function obtains device features, negotiates common features, and finally calls the driver’s probe routine.

For example, the virtio_blk driver’s probe creates the block device, allocates a gendisk , and registers the request queue.

Key functions in the virtblk_probe flow:

Obtain hardware‑supported segment count via virtio_config_val.

Allocate a virtqueue with virtio_find_single_vq.

Allocate a gendisk with alloc_disk.

Initialize the request queue with blk_init_queue and set do_virtblk_request as the handler.

static int __devinit virtblk_probe(struct virtio_device *vdev)
{
    // ...
    err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX, offsetof(struct virtio_blk_config, seg_max), &sg_elems);
    // ...
    err = init_vq(vblk);
    // ...
    vblk->disk = alloc_disk(1 << PART_BITS);
    q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL);
    // ...
    add_disk(vblk->disk);
}
init_vq

allocates the virtqueue, sets callbacks and interrupt handlers:

static int init_vq(struct virtio_blk *vblk)
{
    vblk->vq = virtio_find_single_vq(vblk->vdev, blk_done, "requests");
    // ...
}

Interrupt handling varies:

Without MSI‑X, a single IRQ handles both config‑change and all virtqueue interrupts (handled by vp_interrupt).

With MSI‑X and two vectors, one vector handles config changes, the other handles all virtqueues.

With per‑queue MSI‑X, each virtqueue gets its own vector.

Setup of virtqueues and vrings is performed by setup_vq and vring_new_virtqueue:

static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
                                  void (*callback)(struct virtqueue *), const char *name, u16 msix_vec)
{
    // select queue, read depth, write address, then create vring
    vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN,
                             vdev, info->queue, vp_notify, callback, name);
    return vq;
}

The vring is initialized with vring_init:

static inline void vring_init(struct vring *vr, unsigned int num, void *p,
                              unsigned long align)
{
    vr->num = num;
    vr->desc = p;
    vr->avail = p + num * sizeof(struct vring_desc);
    vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + align - 1) & ~(align - 1));
}

(2) Back‑end initialization

The back‑end driver creates the PCI device, registers BARs, binds the device to the virtio PCI ops, and negotiates features. Example for virtio‑blk:

static int virtio_blk_init_pci(PCIDevice *pci_dev)
{
    VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
    VirtIODevice *vdev;
    // ...
    vdev = virtio_blk_init(&pci_dev->qdev, &proxy->blk);
    virtio_init_pci(proxy, vdev);
    proxy->nvectors = vdev->nvectors;
    return 0;
}

During registration, the driver sets up I/O ports, memory regions, and interrupt vectors, then exposes the device to the guest.

Part 4 – How Virtio Works End‑to‑End

A virtqueue consists of a descriptor table, an available ring, and a used ring. The descriptor table holds the actual buffer addresses and lengths (like a cargo manifest). The available ring is a list of descriptor indices that the guest makes available to the host (a "to‑do" list). The used ring is a list of indices that the host has finished processing (a "completed" list).

Data flow:

The guest front‑end writes data into a buffer, creates a descriptor, adds it to the descriptor table, and places its index into the available ring.

The guest notifies the host (e.g., via an interrupt).

The host reads the available ring, fetches the descriptor, accesses the shared buffer, processes the I/O, writes the result, and places the descriptor index into the used ring.

The host notifies the guest; the guest reads the used ring, knows the operation is complete, and can reclaim or reuse the buffer.

Part 5 – Virtio Code Analysis

5.1 Key Data Structures

virtio_bus

is the generic bus model used by all virtio devices. It is registered early via core_initcall and matches devices to drivers via virtio_dev_match. virtio_device (defined in include/linux/virtio.h) represents a virtio device. It contains an id (device type, e.g., VIRTIO_ID_NET), a config pointer to virtio_config_ops, a list of virtqueue objects, and a features bitmap negotiated with the driver. virtio_driver also lives in include/linux/virtio.h. It provides an id_table for matching, a feature_table describing supported features, and a probe function that is called after the bus‑level probe. virtqueue is the core data structure for I/O. It contains the descriptor table, available ring, and used ring. Each descriptor points to a buffer (or a scatter‑gather list).

5.2 Code Implementation Details

Taking virtio‑net as an example, the driver’s virtnet_probe function discovers and initializes receive and transmit virtqueues, negotiates features such as VIRTIO_NET_F_MQ, and sets up buffers. When the kernel calls dev_hard_start_xmit, the driver eventually reaches start_xmit, which builds a scatter‑gather list from the skb, adds it to the transmit virtqueue with virtqueue_add_outbuf, and notifies the host via virtqueue_notify.

Reception works via interrupt handling: the host raises an interrupt, the driver schedules NAPI, and virtnet_poll pulls completed buffers from the receive virtqueue, converts them to skb, and passes them up the network stack.

On the back‑end side, vhost‑net registers as a misc device ( /dev/vhost‑net). When a user‑space program opens the device, vhost_net_open allocates the vhost_dev and vhost_virtqueue structures, registers a TAP interface, and processes packets in a similar virtqueue‑driven fashion.

5.3 Optimization Techniques

Virtio employs several performance optimizations:

Batch processing : Multiple buffers are queued before a single virtqueue_notify, reducing VM exits.

Cache usage : Frequently accessed configuration data and state are cached in the driver to avoid repeated reads.

Interrupt coalescing : Multiple interrupt events can be merged into one, lowering interrupt overhead.

Memory‑mapped I/O and DMA : Shared memory regions are mapped directly into both guest and host address spaces, and DMA transfers avoid extra copies.

These techniques together enable virtio to deliver I/O performance that is often indistinguishable from native hardware.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxQEMUKVMVirtioio virtualization
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.