Exploring KPROBE_OVERRIDE for Kernel Error Injection
This article examines how the KPROBE_OVERRIDE feature, combined with eBPF, enables precise kernel‑level error injection, discusses its configuration requirements, demonstrates a practical example on a Mellanox NIC driver, and evaluates the associated security and performance implications.
0. Introduction
Motivation: explore using eBPF to capture a kernel function’s execution flow and override its return value despite concerns about the eBPF verifier.
1. Error‑Injection Concept
Deliberately introducing faults to evaluate system robustness, reliability, and fault‑tolerance. In Linux kernel development, KPROBE_OVERRIDE provides a mechanism to modify kernel behavior at runtime.
2. Overview of Error‑Injection Techniques
Hardware‑based injection
Example: voltage glitches, clock faults.
Pros: precise hardware fault simulation.
Cons: high cost, complex setup, potential hardware wear.
User‑space software injection
Example: Chaos Mesh for cloud‑native fault orchestration.
Pros: flexible, covers many cloud‑native scenarios.
Cons: limited support for non‑cloud‑native applications, higher learning curve.
Kernel‑based injection
Example: kernel “fault injection” feature.
Pros: precise simulation of hardware faults and kernel function errors.
Cons: requires kernel knowledge and privileges; limited built‑in injection points.
3. KPROBE_OVERRIDE Mechanism
Enabled by kernel config CONFIG_KPROBE_OVERRIDE and implemented via helper bpf_override_return(). The helper replaces the probed function’s program counter with an override function, so the original function never runs.
Used for error injection, this helper uses kprobes to override the return value of the probed function and set it to rc . The first argument is the context regs on which the kprobe works.
This helper is only available when the kernel is compiled with CONFIG_BPF_KPROBE_OVERRIDE and the target function is marked with ALLOW_ERROR_INJECTION . It is limited to architectures that enable CONFIG_FUNCTION_ERROR_INJECTION ; currently only x86 supports it.
3.1 ALLOW_ERROR_INJECTION Macro
#define ALLOW_ERROR_INJECTION(fname, _etype) \
static struct error_injection_entry __used \
__section("_error_injection_whitelist") \
_eil_addr_##fname = { \
.addr = (unsigned long)fname, \
.etype = EI_ETYPE_##_etype, \
};The macro creates a static error_injection_entry placed in the ELF section _error_injection_whitelist, preventing it from being optimized away.
3.2 Error‑Type Whitelist
enum {
EI_ETYPE_NONE,
EI_ETYPE_NULL, // return NULL on failure
EI_ETYPE_ERRNO, // return -ERRNO on failure
EI_ETYPE_ERRNO_NULL, // return -ERRNO or NULL on failure
EI_ETYPE_TRUE // return true/false on failure
};Select the appropriate type based on the function’s expected error return.
3.3 CONFIG_BPF_KPROBE_OVERRIDE Dependencies
Prompt: Enable BPF programs to override a kprobed function
Type: bool
Depends on: CONFIG_BPF_EVENTS and CONFIG_FUNCTION_ERROR_INJECTION Defined in kernel/trace/Kconfig Available in Linux kernels 4.16–4.20, 5.0–5.19, 6.0–6.10, 6.11‑rc+HEAD
4. Practical Application
Target: error‑handling path in Mellanox NIC driver function mlx5_cmd_exec.
// drivers/net/ethernet/mellanox/mlx5/core/cmd.c
int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size,
void *out, int out_size);4.1 Adding the Whitelist Macro
// #include <asm-generic/error-injection.h>
ALLOW_ERROR_INJECTION(mlx5_cmd_exec, ERRNO);
int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size,
void *out, int out_size) {
int err;
err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, false);
return err ? : mlx5_cmd_check(dev, in, out);
}4.2 eBPF Program
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1024);
} fail_count SEC(".maps");
SEC("kprobe/mlx5_cmd_exec")
int Override_mlx5_cmd_exec(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
char comm[TASK_COMM_LEN];
bpf_get_current_comm(comm, TASK_COMM_LEN);
u64 ts = bpf_ktime_get_ns();
u64 rand = ts % 100;
if (rand < 10) { // 10% probability
bpf_printk("[ERR_MLX_CMD_INJECTION] ---- intercept cmd %s : %d ----", comm, pid);
bpf_override_return(ctx, -5); // -EIO
}
return 0;
}
char _license[] SEC("license") = "GPL";Compile with clang (e.g., clang -O2 -target bpf -c prog.c -o prog.o ) and load with bpftool prog load prog.o /sys/fs/bpf/override or via libbpf‑bootstrap.
4.3 Observed Effect
Running ip link set ethX down triggers the injected error with ~10 % probability, causing the function to return -EIO. The event appears in kernel logs.
5. Security and Performance Impact
System stability : error injection can destabilize or crash the system; tests should run in controlled environments.
Additional overhead : KPROBE_OVERRIDE inserts breakpoints and runs custom code in the eBPF VM, incurring runtime overhead.
6. Future Outlook
KPROBE_OVERRIDE aligns with eBPF’s purpose of dynamic kernel modification for testing, despite concerns about misuse.
Only a limited set of kernel functions are marked with ALLOW_ERROR_INJECTION; expanding the whitelist requires recompiling the kernel, though pattern‑based macro generation could automate the process.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
