Observing Virtio‑Net NIC Queues with eBPF: A Practical Guide
This article explains how to extend eBPF to make virtio‑net NIC queue metrics observable, walks through the front‑end send/receive flow, defines key queue indices, integrates the probes into the rtrace tool, and demonstrates fault detection with real‑time data.
Background and Challenge
In system programming, locating the boundary between components is often the hardest problem, especially for the virtio‑net front‑end and back‑end. The path of packets moving between the kernel and the virtio‑net backend is difficult to observe, making it hard to diagnose queue‑related network jitter.
Virtio‑Net Driver Overview
The virtio‑net device consists of a front‑end driver running in the guest kernel and a back‑end device that can be provided by the host kernel. It supports multiple queues, each implemented with three rings: avail, used and desc. The article focuses on the front‑end packet‑send and packet‑receive processes.
Front‑End Send Path
a. start_xmit : Entry point cleans already‑sent packets by calling free_old_xmit_skbs until avail‑>idx == used‑>idx.
b. xmit_skb : Adds a vnet_hdr header and records the packet’s address and length in a scatter‑gather list.
c. virtqueue_add_outbuf : Performs DMA mapping, places the scatter‑gather entry into the desc ring and increments avail‑>idx.
d. virtqueue_notify : Notifies the back‑end when the send queue contains data.
Front‑End Receive Path
a. NIC hard interrupt : Adds NAPI to the CPU queue, enables interrupt mitigation and triggers a soft interrupt.
b. net_rx_action : Soft‑interrupt entry.
c. virtnet_poll : NAPI poll callback; cleans the send queue if the current queue is a send queue, otherwise receives packets.
d. virtnet_receive : Reads packet data from the descriptor ring using used‑>idx, allocates an skb, and passes it to GRO for merging.
e. try_fill_recv : Adds empty memory regions to the desc ring and increments avail‑>idx to keep the receive queue ready.
f. virtqueue_napi_complete : Completes the NAPI cycle when the number of received packets is below the budget (typically 64) and disables interrupt mitigation.
Queue Observability Metrics
From the analysis of the virtio‑net queue we obtain three important indices: avail‑>idx, used‑>idx and last_used_idx. Using them we can compute:
Send‑queue packet count: avail‑>idx - used‑>idx.
Receive‑queue packet count: used‑>idx - last_used_idx.
Back‑end processing progress: last_used_idx.
Queue saturation: (queue packet count) / (queue length).
Implementation in rtrace
The observability code is integrated into the rtrace tool, part of the SysAK suite from the Longxi community. The eBPF program is attached to the kernel functions dev_id_show and dev_port_show. rtrace periodically reads /sys/class/net/[interface]/dev_id (send‑queue data) and /sys/class/net/[interface]/dev_port (receive‑queue data). When these files are read, the attached eBPF program runs, parses the struct net_device to obtain the vring, extracts avail idx, used idx, queue length and last_used_idx, and forwards the data to rtrace for further processing.
https://gitee.com/anolis/sysak/blob/opensource_branch_sync/source/tools/detect/net/rtrace/src/bpf/virtio.bpf.cFault Detection Example
rtrace outputs queue metrics at one‑second intervals. The sample shows that at 09:47:26 the send‑queue 1 saturation is 0.05% with last_used_idx 3593, and at 09:47:28 it rises to 0.07% while last_used_idx remains unchanged, indicating a stall. After fixing the queue, the 09:48:06 snapshot shows 0.00% saturation and last_used_idx 3599, confirming the issue is resolved.
09:47:24
SendQueue 0.05%/3593 0.00%/852 ...
RecvQueue 0.00%/2805 0.00%/13297 ...
... (subsequent timestamps omitted for brevity) ...
09:48:06
SendQueue 0.00%/3599 0.00%/856 ...
RecvQueue 0.00%/2807 0.00%/13299 ...Conclusion
By exposing avail idx, used idx, last_used_idx and derived metrics, we can evaluate virtio‑net performance, detect queue stalls, and guide optimization. The approach provides deep insight into NIC queue state, aiding both fault diagnosis and performance tuning. The next article will dissect the rtrace source code that implements these probes.
Linux Code Review Hub
A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
