Cloud Native 22 min read

Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes

This article explains how KubeSkoop leverages eBPF to provide low‑overhead, pod‑level network monitoring and real‑time diagnostics for Kubernetes clusters, covering packet flow fundamentals, traditional troubleshooting tool limitations, the exporter’s probe architecture, daily monitoring practices, and future development plans.

Alibaba Cloud Native

Sep 7, 2023

Unlock Real‑Time Container Network Monitoring with KubeSkoop’s eBPF Probes

Packet flow inside a container

In Linux a network packet traverses several kernel layers before reaching an application. The receive path is: NIC driver interrupt → ksoftirqd schedules the packet → netfilter processing → transport‑layer handling (TCP/UDP) → payload placed in the socket receive queue and the application is woken up. The send path is the reverse: the application writes to the socket → transport layer assembles the segment → network layer → netfilter → TC egress → qdisc → NIC driver transmission.

The many layers make troubleshooting difficult, especially when kernel modules, sysctl parameters, or system resources (disk, scheduler, application bugs) affect packet handling.

Traditional network‑troubleshooting tools

net‑tools (netstat) : reads /proc/snmp and /proc/netstat; very low overhead but provides only a limited set of counters.

iproute2 (ip, ss) : uses netlink to obtain richer state information; higher overhead than procfs.

bcc‑tools : eBPF‑based observability for network, disk and scheduler; powerful but requires more expertise.

These tools operate at host or network‑namespace level, making it hard to isolate a single container and they often miss short‑lived anomalies.

Why KubeSkoop was created

KubeSkoop targets the complexity of kernel packet processing and the shortcomings of traditional tools in cloud‑native environments. It provides automatic diagnosis for persistent connectivity failures (e.g., DNS) and real‑time monitoring for intermittent problems such as latency spikes, packet loss or TCP resets.

KubeSkoop exporter architecture

The exporter collects data from eBPF programs, procfs and netlink, then exposes pod‑level metrics to Prometheus and events to Loki (or stdout). Probes are hot‑pluggable; users can enable only the probes they need to balance observability and overhead.

Probe, metric and event overview

socketlatency : measures latency from socket readiness to user‑space read/write; high overhead; useful for diagnosing slow application processing.

sock : parses /proc/net/sockstat; low overhead; monitors socket counts and memory usage.

tcp/udp/tcpext : reads /proc/net/netstat and /proc/net/snmp; low overhead; provides connection counts, retransmissions and UDP error counters.

tcpsummary : uses netlink SOCK_DIAG_BY_FAMILY to aggregate per‑pod TCP state; medium overhead; helps analyse connection‑failure patterns.

tcpreset : traces kernel functions that send/receive RST packets; low overhead; emits events only on reset occurrences.

ip : reads IP‑level counters from /proc/net/snmp; low overhead; detects unreachable routes and malformed packets.

kernellatency : records timestamps at several kernel processing points to compute end‑to‑end packet latency; high overhead; for deep latency investigations.

netdev : collects /proc/net/dev statistics; low overhead; useful for device‑level packet‑loss detection.

softnet : aggregates per‑CPU soft‑interrupt stats from /proc/net/softnet_stat; low overhead; highlights CPU contention affecting networking.

conntrack : retrieves the conntrack table via netlink; high overhead; monitors NAT/connection‑tracking state.

fd : enumerates open file descriptors per pod; high overhead; detects descriptor leaks.

io : reads /proc/io for pod‑level filesystem I/O rates; low overhead; correlates heavy I/O with network slowdown.

biolatency : tracks block‑device read/write latency; medium overhead; links storage latency to network jitter.

Using KubeSkoop exporter

Daily monitoring : Deploy the exporter, scrape its Prometheus metrics and (optionally) forward events to Loki. Enable low‑overhead probes (procfs‑based and lightweight eBPF probes) by default. Grafana dashboards supplied by the project can visualise the metrics.

Incident investigation : When an alert indicates abnormal network behaviour, classify the problem (e.g., TCP connection failures, latency spikes) and enable the relevant probes such as socketlatency or kernellatency. Observe the newly exposed metrics/events in Grafana or via the exporter’s inspector command. If the root cause is not yet clear, enable additional probes iteratively.

Future roadmap

KubeSkoop 0.1.0 introduced support for k3s clusters, automatic host‑interface selection and major performance improvements. Planned work includes a full probe refactor for better extensibility, flow‑level metric collection, event export to local files or ELK stacks, and an enhanced UI that shows cluster‑wide traffic topology.

GitHub repository: https://github.com/ali/baba/kubeskoop

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kubernetes prometheus eBPF network monitoring grafana KubeSkoop

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.