How Kelemetry Transforms Kubernetes Observability with Object‑Centric Tracing
Kelemetry, an open‑source tracing system from ByteDance, visualizes Kubernetes control‑plane events by treating each object as a span, linking audit logs, events, and component interactions to provide a unified, searchable view that simplifies debugging, performance analysis, and multi‑cluster observability.
Kelemetry is a tracing system developed by ByteDance for the Kubernetes control plane, linking the behavior of multiple components to trace the full lifecycle of a Kubernetes object and the interactions between objects.
By visualizing event chains inside the K8s system, it makes the system easier to observe, understand, and debug.
Background
Traditional distributed tracing tracks internal calls during a user request, forming a tree of spans. In Kubernetes, the API is asynchronous and declarative: components update the desired state in the apiserver, and other components reconcile toward that state. This makes span‑based tracing unsuitable.
Existing component‑specific tracing only captures individual reconcile loops, leading to isolated observability islands.
Design
1. Objects as spans – Kelemetry creates a span for each object; events on the object become child spans. Object ownership links spans into a hierarchy, providing both a tree structure for object relationships and a timeline for event order.
Example: a single‑pod Deployment trace shows interactions among the deployment controller, ReplicaSet controller, and kubelet using audit logs and events.
2. Audit log collection – Audit logs from the apiserver provide rich data about controller actions. Kelemetry offers an audit webhook and a plugin API to consume logs from various sources.
3. Event collection – Events emitted by controllers are captured; heuristics avoid duplicate spans (e.g., persisting the last event timestamp, checking resourceVersion changes).
4. Linking object state with audit logs – By monitoring object create/update/delete events and matching them with audit logs via resourceVersion, Kelemetry correlates state changes with the responsible audit entries.
Front‑end Trace Conversion
Kelemetry intercepts results between the Jaeger front‑end and storage, applying custom conversion pipelines such as:
tree – simplified original trace tree.
timeline – flattens pseudo‑spans, placing all event spans under the root.
tracing – flattens non‑object spans into related object span logs.
grouping – creates pseudo‑spans per data source for easier cross‑component inspection.
Breaking Duration Limits
Traces are limited to 30‑minute windows to avoid storage issues. Kelemetry merges consecutive 30‑minute traces with the same object tags, presenting a seamless story to the user.
Multi‑Cluster Support
Kelemetry can monitor events from multiple clusters; objects can be linked across clusters, enabling cross‑cluster tracing.
Future Enhancements
Custom trace sources beyond audit and events.
Batch analysis to aggregate metrics across large numbers of spans.
Use Cases
ReplicaSet controller anomaly – Tracing reveals that the controller created Pods but never updated its status, indicating a possible informer consistency issue.
Floating minReadySeconds – Kelemetry helped identify a temporary increase of minReadySeconds to 3600 caused by a federation component, explaining a slow rolling update.
Kelemetry is open‑source on GitHub ( ) with quick‑start documentation and an online preview.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
