Cloud Native 13 min read

How Kindling Leverages eBPF for Minute‑Level Fault Diagnosis in Cloud‑Native Environments

The interview with Kindling founder Cheng Chan explores how eBPF‑based Kindling tackles the overwhelming metrics, high expertise barrier, and lack of real‑time protocol parsing in cloud‑native observability, detailing its probe architecture, protocol analysis, and roadmap for faster, standardized root‑cause detection.

ITPUB
ITPUB
ITPUB
How Kindling Leverages eBPF for Minute‑Level Fault Diagnosis in Cloud‑Native Environments

Background

eBPF is rapidly maturing and is being used to improve container networking, security, and observability. Compared with traditional kernel modules, eBPF programs are easier to write, but developers still face a steep learning curve when building cloud‑native observability solutions.

Typical Observability Challenges

Excessive metrics make it hard to decide which indicators to monitor.

Teams rely on ad‑hoc tools (logs, traces, DB consoles) and manual experience, leading to unpredictable troubleshooting time.

Production systems are often black boxes; intermittent issues lack reproducible logs or trace points.

APM tools have a high entry barrier, and diagnostic accuracy depends heavily on operator expertise.

Kindling Probe Architecture

Kindling follows an edge‑centric design: data collection and preliminary processing occur on each Kubernetes node, reducing the volume of data sent to the central analyzer.

Kindling‑probe : migrated from falco‑libs, intercepts kernel events and gathers raw eBPF data.

Kindling‑driver : native C component that parses the raw kernel data.

Kindling‑collector : Go service that analyses the data stream, filters valuable information, and formats it for downstream consumption.

Kindling‑Java : integration layer for APM ecosystems such as SkyWalking, Pinpoint, and OpenTelemetry.

Real‑Time Protocol Parsing

The probe implements a request‑response model where request and response can be matched by a unique ID even when they appear in different execution contexts. Parsing the protocol yields three “golden” metrics:

Throughput

Latency

Error rate

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKuberneteseBPFKindlingTrace Profiling
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.