eBPF Talk: Manually Performing Backtrace in arm64 fentry
The article explains why backtracing with eBPF fentry on arm64 is harder than on x86, details the stack layout differences, shows how recent commits changed register saving, and provides a practical detection routine to locate the frame pointer and retrieve the tracee's instruction pointer.
When using fentry for eBPF programs, backtracing on arm64 is harder than on x86 because the usable frame pointer (FP) is not directly exposed.
Register differences
On x86 the BPF R10 register is the frame pointer, enabling direct backtrace. On arm64 R10 maps to register R25, while the actual FP is R29, so the FP is hidden behind several callee‑saved registers.
Stack layout of trampoline and fentry
/* Stack layout on arm64:
* | r9 |
* | fp | FP of tracee's caller
* | lr | IP of tracee
* | fp | FP of tracee
* +------+ FP of trampoline <-------+
* | .. | padding |
* | .. | callee saved regs |
* | retv | retval of tracee |
* | regs | regs of tracee |
* | nreg | number of regs |
* | ip | IP of tracee if needed | possible range of
* | rctx | bpf_tramp_run_ctx | detection
* | lr | IP of trampoline |
* | fp | FP of trampoline <--------- detect it
* +------+ FP of current prog |
* | regs | callee saved regs |
* +------+ R10 of bpf prog <-------+
* | .. |
* +------+ SP of current prog
*/The FP saved by fentry is separated from the trampoline’s FP by several saved registers.
Impact of recent commits
Before commit 5d4fa9ec5643 ("bpf, arm64: Avoid blindly saving/restoring all callee‑saved registers") the number of registers saved by fentry was fixed at 5 or 6 (depending on commit 66ff4d61dc12). After that change the number of saved registers is determined by the fentry program itself – it saves exactly the callee‑saved registers it uses.
Detecting the trampoline frame pointer
A simple heuristic reads each possible saved register location and checks whether the value lies within 256 bytes of the current R10 (the BPF program’s stack pointer). If it does, the value is likely the trampoline’s FP.
static __always_inline u64 detect_tramp_fp(void) {
static const int range_of_detection = 256;
u64 fp, r10;
r10 = get_tracing_fp(); /* R10 of current bpf prog */
for (int i = 6; i >= 0; i--) {
bpf_probe_read_kernel(&fp, sizeof(fp), (void *)(r10 + i * 16));
if (r10 < fp && fp < r10 + range_of_detection)
return fp;
}
return r10;
}The method is not perfectly accurate but works in practice.
Obtaining the tracee’s instruction pointer
Once the trampoline’s FP is known, the tracee’s IP can be read at FP + 8 bytes. Subtracting 12 bytes from that value yields the original IP of the tracee.
Full details are in commit b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64").
Conclusion
Backtracing via fentry on arm64 is challenging because the frame pointer is not directly exposed. By analyzing the stack layout, accounting for register‑saving changes introduced in recent commits, and applying the range‑check heuristic, a usable FP and the tracee’s IP can be recovered.
References
https://github.com/cilium/pwru/pull/468
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
