Understanding Linux ftrace Function Graph Tracer on ARM64
The article details how Linux’s function‑graph ftrace tracer works on ARM64, explaining required kernel configs, how -pg inserts _mcount calls, the runtime patching of ftrace_graph_caller, register usage for argument passing, and return handling, and why shadow‑call‑stack must be disabled to enable precise call‑graph and timing analysis.
In Android performance and power analysis, tools such as Systrace and Perfetto are frequently used. A typical Perfetto trace shows that thread tid=1845 stays in the runnable state for 503 µs before running for 498 µs, which is derived from the kernel sched_waking and sched_switch events.
The article explains how the Linux kernel implements the function‑graph tracer (a type of ftrace) to capture function call relationships and execution times.
Enabling the tracer
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_NOP_TRACER=y
#CONFIG_SHADOW_CALL_STACK is not setAfter setting the above kernel configuration options, the available tracers can be listed:
/sys/kernel/tracing # cat available_tracers
blk function_graph preemptirqsoff preemptoff irqsoff function nopThe current tracer is switched by writing to current_tracer :
/sys/kernel/tracing # echo function_graph > current_tracerWhen the function‑graph tracer is active, the kernel dynamically replaces a nop instruction at ftrace_graph_call with a branch to ftrace_graph_caller . This is done in ftrace_enable_ftrace_graph_caller .
Key implementation steps
During compilation, the -pg option inserts a call to _mcount at the entry of every function.
_mcount saves the frame pointer (FP) and link register (LR) on the stack, then calls ftrace_caller .
If the function‑graph tracer is enabled, ftrace_caller is patched to jump to ftrace_graph_caller instead of the generic tracer.
ftrace_graph_caller extracts the saved LR address, the function’s PC, and the parent’s FP, then calls the C function prepare_ftrace_return with these three arguments.
The ABI for ARM64 (AArch64) is crucial here: registers X0–X7 are used for argument passing, X29 is the frame pointer, and X30 is the link register. prepare_ftrace_return stores the original LR (the return address) in the task’s ret_stack and replaces it with return_hooker , allowing the tracer to record the function exit time before the real return occurs.
When the function finishes, the kernel executes return_to_handler , which restores the saved registers and finally returns to the original caller. This mechanism ensures that the function’s return value (held in X0–X7 ) is preserved while the tracer records the exit.
Because the tracer modifies executable code at runtime, the kernel configuration must disable CONFIG_SHADOW_CALL_STACK (and on older kernels also CONFIG_STRICT_MEMORY_RWX and KERNEL_TEXT_RDONLY ) to allow writing to read‑only text sections. Similar techniques are used in Linux’s live‑patching infrastructure.
The article concludes that the function‑graph tracer provides a powerful way to visualize call graphs and precise execution times, which is valuable for performance debugging and optimization on ARM64 platforms. The analysis is based on the Linux 4.19 source tree.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.