Operations 50 min read

Understanding Linux Kernel Tracing: Probes, Kprobes, Uprobes, Tracepoints, ftrace, Perf, and eBPF

This article explains the concepts and mechanisms behind Linux kernel tracing tools—including ftrace, perf, kprobes, uprobes, tracepoints, ftrace, perf events, and eBPF—showing how probes are injected, how trace data is collected, and which technology to choose for different debugging and performance scenarios.

政采云技术

May 23, 2023

Understanding Linux Kernel Tracing: Probes, Kprobes, Uprobes, Tracepoints, ftrace, Perf, and eBPF

Injecting Probe Mechanism

Probe Handler

Linux provides many tracing tools such as ftrace and perf for kernel debugging and observability. The abundance of tools introduces a variety of concepts—tracepoint, trace events, kprobe, eBPF, etc.—which can be confusing. This article attempts to clarify these concepts.

If we want to trace a kernel function or a specific line of code, the traditional method is to add printk statements before and after the code, rebuild the kernel, and reboot. This approach is cumbersome and unsuitable for production environments.

A more practical method is to define a custom function (probe handler) that is injected before or after the target kernel function. The probe handler can collect context information and store it, and it can be enabled or disabled at runtime to avoid impacting the kernel when not needed.

The custom function is called a probe handler; the injection point is a probe point or hook point. A probe executed before the point is a pre‑handler, after the point is a post‑handler. The injection process is called "instrumentation" and the kernel provides several mechanisms for injecting probe handlers.

Kprobes Mechanism

Kprobes is a dynamic tracing mechanism that can inject probe handlers at any location inside any kernel function without affecting normal execution. There are two types: kprobe (inserts a handler at an arbitrary location) and kretprobe (inserts a handler at function return). For safety, the kernel maintains a blacklist of functions that cannot be instrumented, such as the kprobe code itself.

How Kprobes Implements Probe Injection

The kernel provides a register_kprobe interface. When a kprobe is registered, the kernel copies the instruction at the probe point, replaces the first byte with a breakpoint instruction (INT3 on x86), and stores the original instruction.

When the CPU executes the breakpoint, the kernel's do_int3 handler determines whether it was caused by a kprobe. If so, it saves the CPU state, invokes the registered probe handler via the notifier chain, and passes the saved registers and stack to the handler.

Pre‑handlers run first; after they finish, the CPU flag is set to 1 and single‑step execution begins. The single‑step generates an INT1 exception, invoking do_debug, which then runs the post‑handler before restoring normal execution.

Why kretprobe Is Needed

Although a kprobe could theoretically emulate kretprobe by placing a probe at the last line of a function, this is unreliable because functions may have multiple return paths, early exits, or error handling that bypass the last line.

kretprobe guarantees execution of the handler whenever the function returns, regardless of how the return occurs, making it more reliable for post‑execution tracing.

Uprobes

Uprobes (userspace probes) work similarly to kprobes but operate on user‑space binaries. They do not have a blacklist and require the offset of the probe point from the start of the binary. The following example shows how to obtain the offset using readelf and then register a uprobe via a kernel module.

<span>root@zfane-maxpower:~/traceing<span># cat hello.c</span></span></code><code><span>#include <span><stdio.h></span></span></code><code><span>void test(){ printf("hello world"); }</span></code><code><span>int main(){ test(); return 0; }</span></code><code><span>root@zfane-maxpower:~/traceing<span># gcc hello.c -o hello</span></span>

Using readelf -s and readelf -S we obtain the virtual address of test and the .text section, then compute the offset:

<span>offset = 0x1149 - 0x1060 + 0x1060 = 0x1149</span>

With the offset we can write a kernel module that registers the uprobe:

<span>#include <span><linux/kernel.h></span></span></code><code><span>#include <span><linux/init.h></span></span></code><code><span>#include <span><linux/module.h></span></span></code><code><span>#include <span><linux/fs.h></span></span></code><code><span>#include <span><linux/uprobes.h></span></span></code><code><span>#define DEBUGGEE_FILE "/home/zfane/hello/hello"</span></code><code><span>#define DEBUGGEE_FILE_OFFSET (0x1149)</span></code><code><span>static struct inode *debuggee_inode;</span></code><code><span>static int uprobe_sample_handler(struct uprobe_consumer *con, struct pt_regs *regs){ printk("handler is executed, arg0: %s
", regs->di); return 0; }</span></code><code><span>static int uprobe_sample_ret_handler(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs){ printk("ret_handler is executed
"); return 0; }</span></code><code><span>static struct uprobe_consumer uc = { .handler = uprobe_sample_handler, .ret_handler = uprobe_sample_ret_handler };</span></code><code><span>static int __init init_uprobe_sample(void){ int ret; struct path path; ret = kern_path(DEBUGGEE_FILE, LOOKUP_FOLLOW, &path); if (ret) return -1; debuggee_inode = igrab(path.dentry->d_inode); path_put(&path); ret = uprobe_register(debuggee_inode, DEBUGGEE_FILE_OFFSET, &uc); if (ret < 0) return -1; printk(KERN_INFO "insmod uprobe_sample
"); return 0; }</span></code><code><span>static void __exit exit_uprobe_sample(void){ uprobe_unregister(debuggee_inode, DEBUGGEE_FILE_OFFSET, &uc); printk(KERN_INFO "rmmod uprobe_sample
"); }</span></code><code><span>module_init(init_uprobe_sample);</span></code><code><span>module_exit(exit_uprobe_sample);</span></code><code><span>MODULE_LICENSE("GPL");</span>

Tracepoint

Tracepoints are static hooks placed in kernel source. They are disabled by default (implemented as NOP) and can be enabled at runtime with negligible overhead.

When enabled, the NOP is replaced by a jump to a static call that iterates over registered tracepoint handlers. The handler receives a TraceEvent containing context and arguments, which is stored in the trace buffer.

Injecting Probe via Tracing Tools

Writing kernel modules for tracing is risky because bugs can crash the kernel. Linux therefore provides an event‑tracing framework that does not require kernel modules. The framework defines concepts such as TraceEvent, Event Provider, Event Consumer, Trace Buffer, and Trace Event Format (TEF).

Users can list available events via cat /sys/kernel/debug/tracing/available_events, enable an event (e.g., syscalls:sys_enter_connect) by writing 1 to the corresponding enable file, and read the collected data from /sys/kernel/debug/tracing/trace.

Perf

Perf is a performance analysis suite that uses hardware performance counters and the kernel tracing infrastructure. It can sample events based on time, count, or instruction thresholds.

To add a probe with perf:

<span>$ sudo perf probe -x /usr/lib/debug/boot/vmlinux-$(uname -r) -k do_sys_open</span>

Then record the event:

<span>$ sudo perf record -e probe:do_sys_open -aR sleep 1</span>

And finally view the report: <span>$ sudo perf report -i perf.data</span> Perf creates a perf_event structure in the kernel, which contains a per‑CPU ring buffer. The user‑space tool accesses the buffer via perf_event_open and mmap.

eBPF Probe Injection

eBPF extends the original BPF packet‑filter VM into a general‑purpose in‑kernel VM. It supports program types such as BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT, and BPF_PROG_TYPE_RAW_TRACEPOINT. An eBPF program can be attached to a kprobe, tracepoint, or raw tracepoint, and the kernel invokes the eBPF code as the probe handler.

<span>static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs){ /* ... */ }</span>

When a kprobe triggers, the kernel calls kprobe_perf_func, which looks up the associated eBPF program and executes it via trace_call_bpf. The eBPF program receives the registers and can read arguments using helper functions such as bpf_core_read or PT_REGS_* macros.

Raw Tracepoint

Raw tracepoints bypass the perf event infrastructure. The kernel passes a struct bpf_raw_tracepoint_args containing an array of u64 arguments directly to the eBPF program, eliminating the need for a format structure.

<span>SEC("raw_tracepoint/sched_process_exec")
int raw_tracepoint_demo(struct bpf_raw_tracepoint_args *ctx){
    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (!e) return 0;
    bpf_core_read(&e->filename, sizeof(e->filename), ctx->args[0]);
    e->pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&e->command, sizeof(e->command));
    bpf_ringbuf_submit(e, 0);
    return 0;
}
char _license[] SEC("license") = "GPL";
</span>

BTF‑enabled Raw Tracepoint

Since kernel 4.18, BTF (BPF Type Format) provides rich type information, allowing eBPF programs to access struct members directly without helper functions. Kernel 5.5 introduced BPF_PROG_TYPE_TRACING which uses BTF for raw tracepoints.

<span>SEC("tp_btf/sched_process_exec")
int BPF_PROG(sched_process_exec, struct task_struct *p, pid_t old_pid, struct linux_binprm *bprm){
    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (!e) return 0;
    bpf_printk("filename : %s", bprm->filename);
    bpf_core_read(&e->filename, sizeof(e->filename), bprm->filename);
    e->pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&e->command, sizeof(e->command));
    bpf_ringbuf_submit(e, 0);
    return 0;
}
</span>

BPF Trampoline (FENTRY/FEXIT)

FENTRY/FEXIT are compile‑time instrumentation points added by the -pg (or -mfentry) compiler flag. When enabled, the NOP inserted at function entry is replaced by a call to a BPF trampoline, which invokes the attached eBPF program. This mechanism provides low‑overhead tracing similar to kprobes but without extra kernel data structures.

Transferring Data from Kernel to Userspace

eBPF maps are the primary way to share data between kernel and userspace. The newer BPF ring buffer (available since kernel 5.8) is a lock‑free, per‑CPU MPSC ring buffer that guarantees ordered delivery via a monotonically increasing sequence number.

Older perf‑event ring buffers suffer from memory waste and potential out‑of‑order delivery across CPUs. The BPF ring buffer solves these problems by being cross‑CPU shared and providing sequence numbers for ordering.

Choosing a Kernel Tracing Technology

For ad‑hoc debugging and performance analysis, the author prefers the perf suite because it quickly yields actionable results. For long‑running tracing programs, eBPF is the preferred choice due to its programmability and low overhead.

Conclusion

(kprobes, uprobes), tracepoints, and fprobe (fentry/fexit) are all mechanisms for injecting probe handlers. kprobes/uprobes replace instructions at runtime, while tracepoints are static hooks defined by kernel developers. fprobe adds compile‑time hooks that can be enabled/disabled dynamically, and eBPF can also use fentry/fexit via BPF trampolines.

Probe handlers run in kernel space and forward trace data to userspace via perf_event, trace_event_ring_buffer, or eBPF maps (including the modern BPF ring buffer).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kernel Linux eBPF Tracing perf Kprobes

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.