Fundamentals 15 min read

Exploring KPROBE_OVERRIDE for Kernel Error Injection

This article examines how the KPROBE_OVERRIDE feature, combined with eBPF, enables precise kernel‑level error injection, discusses its configuration requirements, demonstrates a practical example on a Mellanox NIC driver, and evaluates the associated security and performance implications.

Linux Kernel Journey

Oct 29, 2024

Exploring KPROBE_OVERRIDE for Kernel Error Injection

0. Introduction

Motivation: explore using eBPF to capture a kernel function’s execution flow and override its return value despite concerns about the eBPF verifier.

1. Error‑Injection Concept

Deliberately introducing faults to evaluate system robustness, reliability, and fault‑tolerance. In Linux kernel development, KPROBE_OVERRIDE provides a mechanism to modify kernel behavior at runtime.

2. Overview of Error‑Injection Techniques

Hardware‑based injection

Example: voltage glitches, clock faults.

Pros: precise hardware fault simulation.

Cons: high cost, complex setup, potential hardware wear.

User‑space software injection

Example: Chaos Mesh for cloud‑native fault orchestration.

Pros: flexible, covers many cloud‑native scenarios.

Cons: limited support for non‑cloud‑native applications, higher learning curve.

Kernel‑based injection

Example: kernel “fault injection” feature.

Pros: precise simulation of hardware faults and kernel function errors.

Cons: requires kernel knowledge and privileges; limited built‑in injection points.

3. KPROBE_OVERRIDE Mechanism

Enabled by kernel config CONFIG_KPROBE_OVERRIDE and implemented via helper bpf_override_return(). The helper replaces the probed function’s program counter with an override function, so the original function never runs.

Used for error injection, this helper uses kprobes to override the return value of the probed function and set it to rc . The first argument is the context regs on which the kprobe works.

This helper is only available when the kernel is compiled with CONFIG_BPF_KPROBE_OVERRIDE and the target function is marked with ALLOW_ERROR_INJECTION . It is limited to architectures that enable CONFIG_FUNCTION_ERROR_INJECTION ; currently only x86 supports it.

3.1 ALLOW_ERROR_INJECTION Macro

#define ALLOW_ERROR_INJECTION(fname, _etype) \
static struct error_injection_entry __used \
 __section("_error_injection_whitelist") \
 _eil_addr_##fname = { \
  .addr = (unsigned long)fname, \
  .etype = EI_ETYPE_##_etype, \
 };

The macro creates a static error_injection_entry placed in the ELF section _error_injection_whitelist, preventing it from being optimized away.

3.2 Error‑Type Whitelist

enum { 
  EI_ETYPE_NONE, 
  EI_ETYPE_NULL,   // return NULL on failure
  EI_ETYPE_ERRNO,  // return -ERRNO on failure
  EI_ETYPE_ERRNO_NULL, // return -ERRNO or NULL on failure
  EI_ETYPE_TRUE   // return true/false on failure
};

Select the appropriate type based on the function’s expected error return.

3.3 CONFIG_BPF_KPROBE_OVERRIDE Dependencies

Prompt: Enable BPF programs to override a kprobed function

Type: bool

Depends on: CONFIG_BPF_EVENTS and CONFIG_FUNCTION_ERROR_INJECTION Defined in kernel/trace/Kconfig Available in Linux kernels 4.16–4.20, 5.0–5.19, 6.0–6.10, 6.11‑rc+HEAD

4. Practical Application

Target: error‑handling path in Mellanox NIC driver function mlx5_cmd_exec.

// drivers/net/ethernet/mellanox/mlx5/core/cmd.c
int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size,
                 void *out, int out_size);

4.1 Adding the Whitelist Macro

// #include <asm-generic/error-injection.h>
ALLOW_ERROR_INJECTION(mlx5_cmd_exec, ERRNO);

int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size,
                 void *out, int out_size) {
    int err;
    err = cmd_exec(dev, in, in_size, out, out_size, NULL, NULL, false);
    return err ? : mlx5_cmd_check(dev, in, out);
}

4.2 eBPF Program

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __type(key, u32);
    __type(value, u32);
    __uint(max_entries, 1024);
} fail_count SEC(".maps");

SEC("kprobe/mlx5_cmd_exec")
int Override_mlx5_cmd_exec(struct pt_regs *ctx) {
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    char comm[TASK_COMM_LEN];
    bpf_get_current_comm(comm, TASK_COMM_LEN);
    u64 ts = bpf_ktime_get_ns();
    u64 rand = ts % 100;
    if (rand < 10) { // 10% probability
        bpf_printk("[ERR_MLX_CMD_INJECTION] ---- intercept cmd %s : %d ----", comm, pid);
        bpf_override_return(ctx, -5); // -EIO
    }
    return 0;
}
char _license[] SEC("license") = "GPL";

Compile with clang (e.g., clang -O2 -target bpf -c prog.c -o prog.o ) and load with bpftool prog load prog.o /sys/fs/bpf/override or via libbpf‑bootstrap.

4.3 Observed Effect

Running ip link set ethX down triggers the injected error with ~10 % probability, causing the function to return -EIO. The event appears in kernel logs.

5. Security and Performance Impact

System stability : error injection can destabilize or crash the system; tests should run in controlled environments.

Additional overhead : KPROBE_OVERRIDE inserts breakpoints and runs custom code in the eBPF VM, incurring runtime overhead.

6. Future Outlook

KPROBE_OVERRIDE aligns with eBPF’s purpose of dynamic kernel modification for testing, despite concerns about misuse.

Only a limited set of kernel functions are marked with ALLOW_ERROR_INJECTION; expanding the whitelist requires recompiling the kernel, though pattern‑based macro generation could automate the process.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

linux eBPF ALLOW_ERROR_INJECTION bpf_override_return kernel error injection KPROBE_OVERRIDE

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.