Cloud Native 15 min read

eBPF-Based Cross-Language Non-Intrusive Traffic Recording for Cloud-Native Services

The article describes an eBPF‑based, language‑agnostic traffic recording framework that hooks low‑level socket operations and thread identifiers to capture complete request‑response flows across Java, PHP, and Go services without modifying application code, dramatically lowering implementation and maintenance costs for cloud‑native traffic replay.

Didi Tech
Didi Tech
Didi Tech
eBPF-Based Cross-Language Non-Intrusive Traffic Recording for Cloud-Native Services

Testing is a crucial step before product release, but as business scale and complexity increase, the number of regression functions grows, putting huge pressure on testing. Many teams therefore adopt traffic replay for service regression testing.

Before building traffic replay capability, online service traffic must be recorded. The choice of recording method depends on traffic characteristics, implementation cost, and invasiveness to the business.

For Java and PHP, mature solutions such as jvm-sandbox-repeater and rdebug provide low‑cost, non‑intrusive recording. Go lacks a comparable middle layer; existing solutions like sharingan require modifying the Go source and business code, leading to high maintenance cost.

Given Didi’s multi‑language stack, a cross‑language, non‑intrusive traffic recording approach based on eBPF can dramatically reduce usage and maintenance costs.

Recording Principle

During traffic replay, downstream services must be mocked, so a complete recorded flow must contain both the entry request/response and the requests/responses of all downstream calls.

Implementation Idea

The traditional method tracks all framework/RPC SDK send/receive functions, which requires extensive code changes. Instead, we track socket‑related operations (accept, connect, send, recv, close) to achieve a language‑agnostic recording without caring about application‑layer protocols.

Because the recording point is at a lower layer, additional context (e.g., thread IDs) is needed to reconstruct complete flows.

Distinguishing Different Requests

Requests are usually processed in separate threads; sub‑threads may be created for downstream calls. By merging sub‑thread data into the main request thread and using thread IDs, we can separate concurrent requests.

Distinguishing Data Types

Entry traffic uses the socket obtained by accept (recv = request, send = response). Downstream traffic uses the socket obtained by connect (send = request, recv = response). Thus, socket type and identifier allow us to classify data.

Overall Architecture

All containers on a host share the same kernel, so a single eBPF program can record traffic for all processes. The architecture consists of:

Recording agent : runs in the same container as the target process, controls start/stop, receives raw data from the server, parses it into complete flows, and writes logs.

Recording server : runs on the host, loads/mounts the eBPF program and reads raw data from eBPF maps.

eBPF program : intercepts socket send/receive events, extracts data, and stores it in maps.

Instrumentation Points

We hook the following kernel functions via kprobe/kretprobe:

inet_accept

inet_stream_connect

inet_sendmsg

inet_recvmsg

inet_release

For Go, we also use uprobe on runtime.newproc1 to obtain goroutine IDs and parent‑child relationships.

Example: Recording inet_sendmsg

Function signature:

int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)

Entry program (kprobe):

SEC("kprobe/inet_sendmsg")
int BPF_KPROBE(inet_sendmsg_entry, struct socket *sock, struct msghdr *msg) {
    struct probe_ctx pctx = {
        .bpf_ctx = ctx,
        .version = EVENT_VERSION,
        .source = EVENT_SOURCE_SOCKET,
        .type = EVENT_SOCK_SENDMSG,
        .sr.sock = sock,
    };
    if (pid_filter(&pctx)) return 0;
    // read socket info, save context, etc.
    return 0;
}

Return program (kretprobe):

SEC("kretprobe/inet_sendmsg")
int BPF_KRETPROBE(inet_sendmsg_exit, int retval) {
    struct probe_ctx pctx = { .bpf_ctx = ctx, .version = EVENT_VERSION, .source = EVENT_SOURCE_SOCKET, .type = EVENT_SOCK_SENDMSG };
    if (pid_filter(&pctx)) return 0;
    if (retval <= 0) goto out;
    // read saved context, construct event, output data
    out:
    delete_context(pctx.pid);
    return 0;
}

Getting Go goroutine ID:

// getg returns the pointer to the current g.
func getg() *g

static __always_inline u64 get_goid() {
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    unsigned long fsbase = 0;
    void *g = NULL;
    u64 goid = 0;
    bpf_probe_read(&fsbase, sizeof(fsbase), &task->thread.fsbase);
    bpf_probe_read(&g, sizeof(g), (void*)fsbase-8);
    bpf_probe_read(&goid, sizeof(goid), (void*)g+GOID_OFFSET);
    return goid;
}

Challenges in eBPF Development

No global variables or constant strings; use maps.

No function calls; use inline functions.

Stack size limited to 512 bytes; use array maps for buffers.

Cannot directly access user‑space memory; must use bpf helper functions.

Instruction limit of 1 000 000; keep logic simple.

Loops must have a static upper bound.

Structure alignment is critical to avoid verifier errors.

As clang and kernel support for eBPF improves, many of these issues are gradually being resolved.

Security Mechanism

To protect recorded traffic data, encryption is performed during collection, reducing the performance impact of data desensitization.

Conclusion

This article presents an eBPF‑based traffic recording solution that lowers implementation and integration costs for cloud‑native services. The detailed design, instrumentation points, and code examples aim to help engineers quickly build traffic replay capabilities.

cloud-nativeobservabilityGoeBPFtraffic recordingKernel TracingSocket
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.