How eBPF is Revolutionizing Cloud‑Native Observability and Continuous Profiling
This article explains how the eBPF technology is applied in a cloud‑native environment to extend observability from the application layer to the kernel, enabling real‑time traffic analysis, low‑overhead continuous profiling of C++ services, and scalable metric collection across thousands of nodes.
Background
Rapid growth of micro‑service architectures in cloud‑native environments creates a need for observability that can pinpoint the source of traffic spikes and CPU/memory pressure, even when the caller is unknown. Traditional metrics, logs and tracing are insufficient for real‑time, language‑agnostic traffic analysis and continuous profiling of C++ services.
eBPF Overview
eBPF (extended Berkeley Packet Filter) allows sandboxed programs to be attached to kernel hook points without modifying the kernel source or loading modules. It can monitor network packets, trace system calls, collect performance statistics and perform security audits. Two main development models are used:
BCC (BPF Compiler Collection) : provides examples and language bindings (Python, Go) but requires on‑node compilation and has high CPU/memory overhead.
BPF CO‑RE (Compile‑Once‑Run‑Everywhere) : uses BTF (BPF Type Format) to embed kernel type information, enabling pre‑compiled ELF programs to run on multiple kernel versions without installing kernel headers.
Because of its low overhead and ability to collect data directly from the kernel, eBPF is adopted by major internet companies and is the foundation of the solutions described below.
Traffic Analysis Solution
An eBPF agent is deployed as a DaemonSet on each node. The workflow is:
Configuration service delivers a list of target processes (PIDs) to the agent.
The agent loads eBPF programs that hook tcp_sendmsg and tcp_cleanup_rbuf (L4) as well as relevant socket‑read/write syscalls (L7).
Kernel‑side eBPF programs capture packet size, IP/port, protocol payload and store aggregated records in eBPF maps.
User‑space agent reads the maps, enriches records with service metadata (IP ↔ service name, region, Kubernetes cluster) from a CMDB cache, aggregates them and exports Prometheus‑compatible metrics.
Key techniques:
PID‑based filtering to avoid unrelated traffic.
Configurable sampling to reduce load on high‑traffic services.
Metadata cache that batches CMDB lookups; a dedicated cache server isolates cache traffic from metric collectors.
Pre‑aggregation in the collector reduces query pressure, achieving >10× faster queries and extending the query window from 1 day to ≥1 week.
Resource consumption is modest: average CPU ≈ 0.1 core and memory ≈ 200 MiB per node.
Continuous Profiling Solution
Traditional Linux perf profiling of C++ services incurs high CPU, memory and latency, especially when using DWARF unwind information. The eBPF‑based profiler moves stack unwinding and aggregation into the kernel, dramatically reducing data movement.
Agent loads eBPF C programs and associated maps for the target PIDs.
Kernel‑side programs are attached to CPU‑cycle perf events. On each event they unwind the stack using either frame‑pointer or DWARF CFI rules, then increment a counter for the observed stack in a map.
User‑space agent periodically reads the aggregated stacks, converts them to pprof format and serves them via HTTP.
Collector fetches the pprof blobs, resolves symbols (kernel symbols locally, user symbols via generated DWARF indexes), builds flame graphs and stores the data in ClickHouse.
Unwind implementation details:
Two‑map design (data shards + index shards) stores back‑trace tables efficiently across kernel versions.
DWARF unwind uses a back‑trace table generated from .eh_frame CFI instructions; the table is split into a data map and an index map to keep map sizes bounded.
When a stack is deeper than the verifier‑allowed instruction limit, tail‑calls are used to continue unwinding.
Symbol resolution builds a DWARF index that maps address ranges to functions, inline sub‑routines, source files and line numbers. The index is cached with >99.9 % hit rate, reducing per‑sample lookup cost.
Kernel Compatibility and Deployment
CO‑RE requires the kernel to have CONFIG_DEBUG_INFO_BTF=y. For kernels (e.g., 5.4, 5.10) lacking BTF, the agent ships pre‑generated BTF files matching the kernel version and loads them at start‑up. The agent detects the kernel version and BTF availability, loading the appropriate BTF blob automatically.
Metadata Enrichment and Query Optimization
Metrics contain upstream/downstream IP ↔ service mappings. To avoid overwhelming the CMDB, a two‑layer cache is used:
Local in‑memory cache on each collector instance.
External cache server that stores the same data and serves all collectors, eliminating duplicate CMDB queries when the collector tier scales horizontally.
Metrics are pre‑aggregated using a configurable PromQL‑based operator before being written to storage, which reduces the volume of data stored in ClickHouse and speeds up queries by an order of magnitude.
Production Usage and Case Studies
The solution is language‑agnostic and can be enabled or disabled per service within seconds. Real‑world cases include:
Identifying a legacy Node.js frontend that generated sporadic traffic before a service shutdown by tracing Thrift method calls.
Pinpointing an upstream service that caused a sudden Redis traffic surge, enabling rapid mitigation.
Profiling provides near‑real‑time flame graphs, version‑to‑version performance diffs, and automated daily degradation detection for thousands of C++ pods.
Future Work
Extend traffic analysis to generate service topology graphs and support additional protocols (e.g., MySQL).
Add more profiling event types such as off‑CPU time and memory‑leak detection.
Support on‑demand flame‑graph queries over arbitrary time ranges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
