Fundamentals 16 min read

Understanding eBPF: A Beginner’s Guide to Linux’s In‑Kernel Virtual Machine

This article introduces eBPF as a register‑based virtual machine inside the Linux kernel, explains its non‑Turing‑complete design, shows how programs are loaded, verified, and attached to events, and provides a complete C example that counts TCP, UDP, and ICMP packets on a loopback interface.

Qingyun Technology Community

Nov 24, 2021

Understanding eBPF: A Beginner’s Guide to Linux’s In‑Kernel Virtual Machine

Introduction

This series of blog posts dives into the low‑level details of eBPF, starting from its virtual‑machine mechanism and tools, and eventually covering tracing on resource‑constrained embedded devices. The terms BPF and eBPF are used interchangeably throughout.

Series Overview

Part 1 and Part 2 provide an introduction for newcomers and anyone who wants to understand the underlying details of the eBPF technology stack.

Part 3 gives an overview of user‑space tools, building on the virtual‑machine mechanisms introduced earlier.

Part 4 focuses on running eBPF programs on embedded systems where full toolchains (BCC/LLVM/python) are impractical, using a lightweight embedded toolchain on 32‑bit ARM.

Part 5 covers user‑space tracing, shifting the focus from kernel tracing to user‑process tracing.

What is eBPF?

eBPF is a register‑based virtual machine that uses a custom 64‑bit RISC instruction set to run just‑in‑time compiled BPF programs inside the Linux kernel, with access to a subset of kernel functions and memory. It is a full VM implementation, not to be confused with KVM, and it is part of the mainline kernel, requiring no third‑party modules such as LTTng or SystemTap. Readers familiar with DTrace may find the DTrace/BPFtrace comparison useful.

Running a complete VM in the kernel improves convenience and safety: although the same functionality could be achieved with traditional kernel modules, direct kernel programming is risky and can cause system lock‑ups, memory corruption, or crashes, especially on production devices. Therefore, executing JIT‑compiled kernel code in a safe VM is valuable for security monitoring, sandboxing, network filtering, program tracing, performance analysis, and debugging. Simple examples can be found in the eBPF reference.

eBPF programs are intentionally not Turing‑complete: they originally disallowed loops (bounded loops are now supported via #pragma unroll), must terminate, and all memory accesses are bounded and type‑checked. Programs cannot contain null dereferences, must contain at most BPF_MAXINSNS (default 4096, now relaxed to 1 000 000 for non‑privileged programs), and the entry function receives a context argument. When loaded, the verifier parses the instructions into a directed acyclic graph, allowing fast and simple correctness checks.

How eBPF Works

eBPF programs are triggered by kernel events (kprobes/uprobes, tracepoints, socket events, etc.) and can hook into any function to inspect memory, intercept file operations, or examine network packets. Programs can write data to maps or ring buffers, or call a limited set of helper functions. Multiple programs can share maps, and a special "program array" map can store references to other eBPF programs, enabling limited nesting without infinite recursion.

The execution steps are:

User space sends bytecode and program type to the kernel; the type determines which kernel sub‑set is accessible.

The kernel runs a verifier on the bytecode to ensure safety.

The kernel JIT‑compiles the bytecode to native code and attaches it to the specified hook point.

The attached code writes data to a ring buffer or a generic key‑value map.

User space reads results from the shared map or ring buffer.

Maps and ring buffers are managed by the kernel, similar to pipes and FIFOs, and persist as long as at least one program holds a reference.

To simplify eBPF development, the kernel provides the libbpf library (dual‑licensed under LGPL 2.1 and BSD 2‑Clause), which offers wrappers for syscalls such as bpf_load_program and definitions like bpf_map. Example code resides in the samples/bpf/ directory.

Sample Study

The kernel lacks user‑space conveniences like Glibc, LLVM, or WebAssembly, so eBPF examples often include raw bytecode or use libbpf to load pre‑assembled bytecode. Below is a simplified example ( sock_example.c) that counts TCP, UDP, and ICMP packets on the loopback interface.

static int test_sock(void) {
    int sock = -1, map_fd, prog_fd, i, key;
    long value = 0, tcp_cnt, udp_cnt, icmp_cnt;

    map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value), 256, 0);
    if (map_fd < 0) {
        printf("failed to create map '%s'
", strerror(errno));
        goto cleanup;
    }

    struct bpf_insn prog[] = {
        BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
        BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
        BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
        BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
        BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
        BPF_LD_MAP_FD(BPF_REG_1, map_fd),
        BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
        BPF_MOV64_IMM(BPF_REG_1, 1),
        BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
        BPF_MOV64_IMM(BPF_REG_0, 0),
        BPF_EXIT_INSN(),
    };
    size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
    prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt, "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
    if (prog_fd < 0) {
        printf("failed to load prog '%s'
", strerror(errno));
        goto cleanup;
    }
    sock = open_raw_sock("lo");
    if (setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)) < 0) {
        printf("setsockopt %s
", strerror(errno));
        goto cleanup;
    }
    for (i = 0; i < 10; i++) {
        key = IPPROTO_TCP;
        assert(bpf_map_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
        key = IPPROTO_UDP;
        assert(bpf_map_lookup_elem(map_fd, &key, &udp_cnt) == 0);
        key = IPPROTO_ICMP;
        assert(bpf_map_lookup_elem(map_fd, &key, &icmp_cnt) == 0);
        printf("TCP %lld UDP %lld ICMP %lld packets
", tcp_cnt, udp_cnt, icmp_cnt);
        sleep(1);
    }
    cleanup:
    return 0;
}

The example creates a BPF map acting as a fixed‑size array of 256 elements, uses kernel macros to define the bytecode, loads the program with bpf_load_program, attaches it to a raw socket, and periodically reads packet counters from the map.

Conclusion

Part 1 introduced the fundamentals of eBPF, demonstrating how to load bytecode and communicate with the eBPF VM via a C example. Due to space constraints, compiling and running the example are left as an exercise. The next parts will avoid analyzing raw bytecode (reserved for Part 2) and will explore higher‑level tools and scripting languages to interact with the VM more efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

eBPF Linux kernel libbpf network tracing BPF virtual machine

Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.