How to Capture the unlink System Call with eBPF kprobe: A Step‑by‑Step Guide
This article explains how to use Linux eBPF kprobe (and kretprobe) to dynamically instrument the unlink system call, covering the underlying concepts, required kernel headers, full eBPF source code, compilation with both eunomia‑bpf and cilium/ebpf, and a detailed comparison with tracepoint probes.
Problem
Developers often need to know whether a specific kernel function such as unlink is called, when it is called, and with what arguments, without recompiling the kernel or inserting permanent logging statements.
Solution Overview
kprobeis a kernel feature that lets users insert probe points into almost any kernel function. Three probing mechanisms exist: kprobe – basic entry/exit/fault callbacks ( pre_handler, post_handler, fault_handler). jprobe – built on kprobe to capture function arguments. kretprobe – built on kprobe to capture a function’s return value.
These probes have minimal impact on normal execution and can be removed dynamically once enough data is collected.
Simple Example: Capturing unlink
The following eBPF program defines a kprobe on the entry of do_unlinkat and a kretprobe on its exit. It prints the PID, filename, and return value to /sys/kernel/debug/tracing/trace_pipe.
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
/* Copyright (c) 2021 Sartura */
#define BPF_NO_GLOBAL_DATA
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_endian.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("kprobe/do_unlinkat")
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
{
pid_t pid = bpf_get_current_pid_tgid() >> 32;
const char *filename = BPF_CORE_READ(name, name);
bpf_printk("KPROBE ENTRY pid = %d, filename = %s
", pid, filename);
return 0;
}
SEC("kretprobe/do_unlinkat")
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
{
pid_t pid = bpf_get_current_pid_tgid() >> 32;
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld
", pid, ret);
return 0;
}Two probe points are attached: one at the function entry to read the PID and filename, and one at the function exit to read the PID and return value.
Because the kernel’s vmlinux.h header is required, it can be generated with bpftool:
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.hCompilation and Loading Methods
Method 1 – eunomia‑bpf
Compile the program with the ecc tool and load it with ecli:
$ ecc kprobeunlink.bpf.c
Compiling bpf object...
Packing ebpf object and config into package.json... sudo ecli run package.jsonTrigger the probe by creating and deleting files, then view output in /sys/kernel/debug/tracing/trace_pipe:
touch test1
rm test1
cat /sys/kernel/debug/tracing/trace_pipe
# Example output:
KPROBE ENTRY pid = 24696, filename = test1
KPROBE EXIT: pid = 24696, ret = 0Method 2 – cilium/ebpf (Go)
Generate Go bindings with bpf2go and load the program using the cilium/ebpf library:
//go:generate bpf2go -cc clang -cflags "-O2 -g -D__TARGET_ARCH_x86" --go-package main kprobeunlink kprobeunlink.bpf.c -- -I/usr/include/bpf -I/usr/include/linux
package main
import (
"C"
"log"
"github.com/cilium/ebpf/link"
"github.com/cilium/ebpf/rlimit"
"os"
"os/signal"
"syscall"
)
func main() {
if err := rlimit.RemoveMemlock(); err != nil {
log.Fatalf("Removing memlock limit: %v", err)
}
objs := kprobeunlinkObjects{}
if err := loadKprobeunlinkObjects(&objs, nil); err != nil {
log.Fatalf("loading objects: %v", err)
}
defer objs.Close()
kp, err := link.Kprobe("do_unlinkat", objs.DoUnlinkat, nil)
if err != nil {
log.Fatalf("attaching kprobe: %v", err)
}
defer kp.Close()
krp, err := link.Kretprobe("do_unlinkat", objs.DoUnlinkatExit, nil)
if err != nil {
log.Fatalf("attaching kretprobe: %v", err)
}
defer krp.Close()
log.Println("eBPF program successfully attached, press Ctrl+C to exit...")
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
<-sigs
}Running this program produces the same PID/filename/return‑value logs as the eunomia‑bpf method.
Discussion – Characteristics and Limitations of kprobes
Multiple kprobes can be registered on the same address, but jprobe cannot.
Any kernel function can be probed except the internal implementations of kprobe itself (e.g., functions in kernel/kprobes.c), do_page_fault, and notifier_call_chain.
Inline functions may not be reliably probed because the compiler can inline them.
Probe callbacks can modify the probed function’s context (e.g., struct pt_regs), enabling bug‑fix injection or fault injection.
Re‑entering a probed function (e.g., probing printk while its callback calls printk) increments the nmissed counter instead of triggering another callback.
Registration and deregistration of kprobes avoid using mutexes or dynamic memory allocation.
Callbacks run with pre‑emptions disabled and possibly with interrupts disabled; therefore, they must not call functions that may sleep (e.g., mutexes, semaphores). kretprobe replaces the return address with a trampoline, so calls to __builtin_return_address() return the trampoline address.
If a function’s entry and exit counts differ, a kretprobe may miss some returns (e.g., do_exit).
On x86_64, registering a kretprobe on __switch_to fails with -EINVAL because the CPU may be on a non‑current stack.
Extension – kprobe vs tracepoint
Working principle : kprobe inserts dynamic hooks at any function address, while tracepoint uses statically defined hooks placed by kernel developers.
Flexibility : kprobe can monitor virtually any kernel function without source changes; tracepoint is limited to pre‑defined locations.
Overhead : kprobe may add extra overhead because it rewrites instructions; tracepoint incurs lower overhead as it is compiled into the kernel.
Use cases : kprobe is suited for fine‑grained debugging or performance analysis of arbitrary functions; tracepoint is better for high‑frequency monitoring of known critical paths.
Safety : kprobe’s unrestricted insertion can destabilize the system if misused; tracepoint’s static nature makes it safer and more reliable.
In summary, kprobe offers greater flexibility and dynamic capability, whereas tracepoint provides lower‑cost, safer monitoring for predefined points. Choose kprobe for deep, ad‑hoc analysis; choose tracepoint for stable, high‑performance observability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
