Fundamentals 8 min read

Understanding eBPF Ringbuf: Design, API, and Comparison

The article explains the motivation, design, and API of the new multi‑producer single‑consumer eBPF Ring Buffer, compares it with perf buffers and other alternatives, and provides complete BPF and userspace code examples demonstrating reservation, commit, and polling of events while preserving ordering across CPUs.

Linux Kernel Journey

Sep 27, 2024

Understanding eBPF Ringbuf: Design, API, and Comparison

Introduction

The eBPF ecosystem now includes a new MPSC (multiple‑producer single‑consumer) Ring Buffer that allows several CPUs to submit data to a shared buffer while the consumer side assumes a single consumer.

Motivation

Two key motivations drive the creation of this Ring Buffer because existing perf buffer cannot satisfy them:

More efficient memory usage : multiple CPUs share a single Ring Buffer.

Event order preservation : events such as fork/exec/exit that occur on different CPUs remain in chronological order.

The per‑CPU design of perf buffer leads to inefficient memory use and can cause event‑order disorder, which the MPSC Ring Buffer solves.

Syntax and API

In BPF programs the Ring Buffer is represented by a map of type BPF_MAP_TYPE_RINGBUF, which is more efficient than creating a separate Ring Buffer per CPU.

Core API

bpf_ringbuf_output()

: copies data into the Ring Buffer, similar to bpf_perf_event_output(). bpf_ringbuf_reserve() / bpf_ringbuf_commit() / bpf_ringbuf_discard(): a two‑step submission. First call bpf_ringbuf_reserve() to reserve a fixed‑size slot; if successful, the program writes to the returned pointer and then either commits with bpf_ringbuf_commit() or discards with bpf_ringbuf_discard(). The reserve and commit/discard calls must be paired, otherwise the eBPF verifier rejects the program. Because bpf_ringbuf_reserve() may involve lock contention, it is recommended to call it as late as possible.

Example BPF Program

A minimal BPF program that records the PID and CPU at each scheduler switch point:

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 4096);
} my_ringbuf SEC(".maps");

SEC("tracepoint/sched/sched_switch")
int handle_sched_switch(struct bpf_pt_regs *ctx) {
    void *record = bpf_ringbuf_reserve(&my_ringbuf, sizeof(struct sched_event), 0);
    if (!record)
        return 0; // reserve failed
    // fill data
    struct sched_event *event = record;
    event->pid = bpf_get_current_pid_tgid() >> 32;
    event->cpu = bpf_get_smp_processor_id();
    bpf_ringbuf_commit(record, 0);
    return 0;
}

char _license[] SEC("license") = "GPL";

User‑Space Consumption

In userspace the data can be read with ring_buffer__poll:

#include <stdio.h>
#include <bpf/libbpf.h>
#include <bpf/ringbuf.h>

struct sched_event {
    __u32 pid;
    __u32 cpu;
};

int main() {
    struct ring_buffer *ring_buf;
    struct bpf_object *obj;
    int err;
    // load BPF program (omitted)
    ring_buf = ring_buffer__new(bpf_map__fd(obj->maps->my_ringbuf), NULL, NULL);
    if (!ring_buf) {
        fprintf(stderr, "Failed to create ring buffer
");
        return 1;
    }
    while (true) {
        err = ring_buffer__poll(ring_buf, 100);
        if (err < 0) {
            fprintf(stderr, "Polling error: %d
", err);
            break;
        }
    }
    ring_buffer__free(ring_buf);
    bpf_object__close(obj);
    return 0;
}

This example defines a simple BPF program that records the process ID and CPU at a scheduling point, and a userspace loop that polls the Ring Buffer for events.

Design and Implementation Details

The reserve/commit mechanism allows multiple producers—whether on different CPUs or within the same BPF program—to independently reserve records without blocking each other. If one BPF program is pre‑empted by another sharing the same Ring Buffer, both can still reserve space as long as there is capacity, and later commit their data independently.

Internally the Ring Buffer is a power‑of‑two circular buffer that uses two ever‑increasing counters:

Consumer counter: logical position up to which the consumer has read.

Producer counter: total amount of data reserved by all producers. Each reservation advances this counter, but the data is not yet ready for consumption. Every record has an 8‑byte header containing the length and two flags (busy and discard).

Comparison with Alternative Solutions

Before implementing the BPF Ring Buffer, the author evaluated existing kernel mechanisms and found them unsuitable:

Per‑CPU buffers (e.g., perf, ftrace) cannot meet ordering and memory‑efficiency requirements.

Linked‑list based designs, even those supporting multiple producers, are complex for userspace consumption and perform poorly.

io_uring is SPSC and requires fixed‑size elements; converting it to MPSC would degrade performance.

Specialised implementations (e.g., the new printk Ring Buffer) have many constraints that do not fit BPF program needs.

Introducing the MPSC Ring Buffer makes eBPF usage more flexible while retaining efficient memory usage and event ordering.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

eBPF ring buffer kernel programming BPF_MAP_TYPE_RINGBUF perf buffer userspace

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.