Mastering eBPF with BCC: A Step‑by‑Step Guide to Building the opensnoop Tool
This article outlines the standard BCC workflow for creating eBPF tools, then dissects the opensnoop source code, covering requirement analysis, kernel‑space program writing, BPF map configuration, user‑space Python integration, argument handling, testing, optimization, and deployment steps to monitor open system calls.
Standard Process
To match logical understanding, the article first presents the standard workflow for developing eBPF programs with BCC, then applies it to the
opensnooptool source code.
1.1 Requirement Analysis and Design
Identify the system call or kernel function to monitor.
Design data structures and filter criteria.
Choose appropriate probe types (kprobe, uprobe, tracepoint, etc.).
A future article will explain probe and tracepoint concepts.
1.2 Write the BPF Kernel‑Space Program
Define data structures that describe the information to trace.
Write the eBPF program in C.
Implement data collection and filtering logic (details omitted here).
1.3 Choose and Configure BPF Maps
Select storage types (Hash, Array, Stack, etc.) based on requirements.
Configure data transfer mechanisms (Perf Buffer, Ring Buffer).
BPF maps are the core data structures of eBPF programs, enabling data sharing between kernel and user space, among multiple eBPF programs, and internal state maintenance.
1.4 Write the User‑Space Python Program
Use the convenient APIs provided by the BCC framework.
Initialize the BPF object.
Attach probes to target functions.
Configure event‑handling callbacks.
1.5 Implement Data Processing Logic
Define data structures that correspond to the kernel‑space structures.
Write event callback functions.
Format and output the collected data.
1.6 Add Command‑Line Arguments and Filtering
Make the program accept user input for more flexibility and robustness.
Use
argparseto handle command‑line options.
Implement dynamic filter conditions.
Add debugging and help information.
1.7 Test and Optimize
Validate functional correctness.
Optimize performance and memory usage.
Handle exceptional cases.
1.8 Package and Deploy
Write usage documentation.
Manage dependencies.
Prepare distribution packages.
opensnoop Tool Source Code Breakdown
The opensnoop tool monitors real‑time file open events by tracing the
open(),
openat(), and
openat2()system calls, reporting PID, process name, UID, file path, flags, and mode, with multi‑dimensional filtering support.
Security auditing: monitor sensitive file accesses.
Performance tuning: analyze application file‑access patterns.
Fault diagnosis: locate permission issues.
Compliance checking: record system file‑access logs.
The tool uses kprobe/kretprobe probes.
<code>// Declare a BPF hash map
BPF_HASH(infotmp, u64, struct val_t);
// Attach as kretprobe to the syscall
int trace_return(struct pt_regs *ctx)
{
u64 id = bpf_get_current_pid_tgid();
struct val_t *valp;
struct data_t data = {};
u64 tsp = bpf_ktime_get_ns();
valp = infotmp.lookup(&id);
if (valp == 0) {
// missed entry
return 0;
}
bpf_probe_read_kernel(&data.comm, sizeof(data.comm), valp->comm);
bpf_probe_read_user_str(&data.name, sizeof(data.name), (void *)valp->fname);
data.id = valp->id;
data.ts = tsp / 1000;
data.uid = bpf_get_current_uid_gid();
data.flags = valp->flags; // EXTENDED_STRUCT_MEMBER
data.mode = valp->mode; // EXTENDED_STRUCT_MEMBER
data.ret = PT_REGS_RC(ctx);
SUBMIT_DATA
infotmp.delete(&id);
return 0;
}
// Attach as kprobe to the syscall entry
int syscall__trace_entry_open(struct pt_regs *ctx, const char __user *filename,
int flags, u32 mode)
{
struct val_t val = {};
u64 id = bpf_get_current_pid_tgid();
u32 pid = id >> 32; // PID is higher part
u32 tid = id; // lower part
u32 uid = bpf_get_current_uid_gid();
if (bpf_get_current_comm(&val.comm, sizeof(val.comm)) == 0) {
val.id = id;
val.fname = filename;
val.flags = flags; // EXTENDED_STRUCT_MEMBER
val.mode = mode; // EXTENDED_STRUCT_MEMBER
infotmp.update(&id, &val);
}
return 0;
};
</code>Data structures:
<code>// Temporary storage (used by kprobe)
struct val_t {
u64 id; // PID/TID combination
char comm[TASK_COMM_LEN]; // Process name
const char *fname; // File name pointer
int flags; // Open flags
u32 mode; // Permission mode
};
// Event output structure
struct data_t {
u64 id; // PID/TID combination
u64 ts; // Timestamp (µs)
u32 uid; // User ID
int ret; // Syscall return value
char comm[TASK_COMM_LEN]; // Process name
char name[NAME_MAX]; // File path
int flags; // Open flags
u32 mode; // Permission mode
enum event_type type; // Event type (full‑path feature)
};
</code>BPF map definitions:
<code>// Perf output for kernel‑to‑user data transfer
BPF_PERF_OUTPUT(events);
// Temporary hash map for kprobe data
BPF_HASH(infotmp, u64, struct val_t);
</code>The custom functions
syscall__trace_entry_openand
trace_returnare attached to the kernel
openfamily syscalls using
attach_kprobeand
attach_kretprobeAPIs. When the syscall is entered, the entry function stores relevant information in a temporary map; when it returns, the return function reads the stored data, fills a
data_tstructure, and sends it to user space via
events.perf_submit.
<code>b = BPF(text='')
fnname_open = b.get_syscall_prefix().decode() + 'open'
fnname_openat = b.get_syscall_prefix().decode() + 'openat'
fnname_openat2 = b.get_syscall_prefix().decode() + 'openat2'
# Initialize BPF
b = BPF(text=bpf_text)
if not is_support_kfunc:
b.attach_kprobe(event=fnname_open, fn_name="syscall__trace_entry_open")
b.attach_kretprobe(event=fnname_open, fn_name="trace_return")
b.attach_kprobe(event=fnname_openat, fn_name="syscall__trace_entry_openat")
b.attach_kretprobe(event=fnname_openat, fn_name="trace_return")
if fnname_openat2:
b.attach_kprobe(event=fnname_openat2, fn_name="syscall__trace_entry_openat2")
b.attach_kretprobe(event=fnname_openat2, fn_name="trace_return")
</code>Data collection in the entry and return functions extracts the process name, PID/TID, file name, flags, mode, timestamps, UID, and return value, then submits the event to user space.
<code># In the entry probe
if (bpf_get_current_comm(&val.comm, sizeof(val.comm)) == 0) {
val.id = id;
val.fname = filename;
val.flags = flags;
val.mode = mode;
infotmp.update(&id, &val);
}
</code> <code># In the return probe
bpf_probe_read_kernel(&data.comm, sizeof(data.comm), valp->comm);
bpf_probe_read_user_str(&data.name, sizeof(data.name), (void *)valp->fname);
data.id = valp->id;
data.ts = tsp / 1000;
data.uid = bpf_get_current_uid_gid();
data.flags = valp->flags;
data.mode = valp->mode;
data.ret = PT_REGS_RC(ctx);
SUBMIT_DATA
infotmp.delete(&id);
</code>On the user side, the
print_eventcallback processes events received from the perf buffer:
<code>def print_event(cpu, data, size):
event = b["events"].event(data)
# Output formatting logic omitted for brevity
b["events"].open_perf_buffer(print_event, page_cnt=args.buffer_pages)
start_time = datetime.now()
while not args.duration or datetime.now() - start_time < args.duration:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit()
</code>At this point steps 1.1 through 1.5 are complete; steps 1.6 to 1.8 involve further application‑side development such as argument handling, testing, optimization, and packaging.
Big Data Technology Tribe
Focused on computer science and cutting‑edge tech, we distill complex knowledge into clear, actionable insights. We track tech evolution, share industry trends and deep analysis, helping you keep learning, boost your technical edge, and ride the digital wave forward.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.