Operations 44 min read

Mastering Linux Kernel Tracing: From Kprobes to eBPF

This article explains Linux kernel tracing tools—including kprobes, kretprobes, uprobes, tracepoints, ftrace, perf, and eBPF—detailing how probe handlers are injected, how events are recorded via TraceFS, and which technique best fits different debugging and performance‑analysis scenarios.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Mastering Linux Kernel Tracing: From Kprobes to eBPF

Background

Linux provides many tracing tools such as ftrace and perf to debug the kernel and improve observability. The abundance of mechanisms—tracepoint, trace events, kprobe, uprobes, eBPF—creates confusion about their purpose and usage.

Probe Injection Mechanism

To trace a kernel function without recompiling, a custom function called a probe handler is injected at a hook point. The handler can be a pre‑handler (executed before the probed instruction) or a post‑handler (executed after). Injection is known as “instrumentation”.

Kprobes

Kprobes dynamically insert a probe handler at any instruction. Two types exist: kprobe (any location) and kretprobe (function return). The kernel replaces the first byte of the target instruction with an int3 breakpoint; when hit, the breakpoint handler saves CPU state and calls the registered probe handler.

Pre‑handler runs first; after it finishes the CPU single‑steps the original instruction, generating an int1 exception that triggers the post‑handler.

Uprobes

Uprobes work like kprobes but target user‑space binaries. The offset from the binary’s start address to the probe point is calculated using readelf to obtain symbol and section addresses:

root@host:~# readelf -s hello | grep test
   36: 0000000000001149    31 FUNC    GLOBAL DEFAULT   16 test
root@host:~# readelf -S hello | grep .text
  [16] .text PROGBITS 0000000000001060 00001060

The offset is computed as offset = test_va - .text_va + .text_file_offset, then a kernel module registers the probe via uprobe_register().

Tracepoint

Tracepoints are static hook points declared in kernel code. They are disabled by default (implemented as nop) and can be enabled at runtime via the static jump‑patch mechanism, which replaces the nop with a jmp to a static_call that invokes registered probe handlers.

Event Tracing Framework

Event tracing abstracts tracing concepts into:

TraceEvent – a structured record of an occurrence.

Event Provider – kernel code that defines and registers events.

Event Consumer – userspace process that reads events.

Trace Buffer – a kernel‑allocated memory region where events are stored.

Events can be listed via /sys/kernel/debug/tracing/available_events, enabled by writing 1 to the corresponding enable file, and read from /sys/kernel/debug/tracing/trace.

TraceFS

TraceFS is a virtual filesystem exposing tracing control files. For example, writing to kprobe_events creates a kprobe, and the kernel parses the line to register the probe.

Function Tracing (Ftrace)

Ftrace provides two tracing capabilities:

Dynamic function‑level probes (similar to kprobes).

Static compile‑time instrumentation using the -pg option, which inserts a call to mcount at each function entry. During boot, ftrace_init() replaces these nop s with real probe calls.

Since kernel 4.19, -mfentry replaces mcount with a lightweight fentry call.

Perf

Perf is a performance analysis suite that uses hardware performance counters and also supports kernel tracing. It creates perf_event structures with ring buffers, maps them to userspace via mmap, and records events such as syscalls or custom probes.

$ sudo perf probe -x /usr/lib/debug/boot/vmlinux-$(uname -r) -k do_sys_open
$ sudo perf record -e probe:do_sys_open -aR sleep 1
$ sudo perf report -i perf.data

Perf uses its own lock‑free ring buffer because the original Ftrace buffer cannot be written from NMI context.

eBPF Integration

eBPF extends BPF to a general in‑kernel VM. It can attach to kprobes, tracepoints, perf events, and raw tracepoints. An eBPF program is loaded via bpf() syscalls and runs as the probe handler.

SEC("tracepoint/sched/sched_process_exec")
int tracepoint_demo(struct sched_process_exec_args *ctx) {
    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (!e) return 0;
    unsigned short filename_offset = ctx->__data_loc & 0xFFFF;
    char *filename = (char *)ctx + filename_offset;
    bpf_core_read(&e->filename, sizeof(e->filename), filename);
    e->pid = bpf_get_current_pid_tgid() >> 32;
    bpf_ringbuf_submit(e, 0);
    return 0;
}

Raw tracepoints skip the helper‑function layer and pass arguments directly as an array of u64, offering lower overhead.

BPF Trampoline (Fentry/Fexit)

When the kernel is compiled with -pg -mentry, each function contains a fentry call that can be replaced by a BPF trampoline. The trampoline passes arguments to the eBPF program without extra data structures, providing the smallest possible overhead.

Data Transfer to Userspace

eBPF programs can use:

BPF ring‑buffer maps (kernel ≥5.8) – a shared, lock‑free, cross‑CPU buffer with sequence numbers.

Perf event maps – the traditional per‑CPU ring buffers.

TraceFS files – expose trace buffers as virtual files.

The ring‑buffer design solves memory waste and ordering issues present in per‑CPU perf buffers.

Choosing a Tracing Technique

For ad‑hoc debugging and performance analysis, perf tools are convenient. For long‑running, programmable tracing across many nodes, eBPF offers flexibility and low overhead. Kprobes/uprobes are useful for dynamic insertion, while tracepoints provide stable, developer‑maintained hooks.

Conclusion

Kprobes/kretprobes, uprobes, tracepoints, fprobe (fentry/fexit), and eBPF all provide mechanisms to inject probe handlers into the kernel. They differ in how they replace instructions, their dynamic vs. static nature, and the way they deliver data to userspace via perf events, trace buffers, or BPF maps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxeBPFtracepointKprobesKernelTracing
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.