Operations 9 min read

How Inferred Spans Boost Distributed Tracing Accuracy and Coverage

The article examines the implementation of inferred spans as an advanced observability technique that enriches traditional distributed tracing by automatically generating additional spans, improving coverage, pinpointing latency sources, and offering performance‑optimisation insights while discussing practical integration, algorithmic details, and associated trade‑offs.

AsiaInfo Technology: New Tech Exploration

Nov 22, 2024

How Inferred Spans Boost Distributed Tracing Accuracy and Coverage

Introduction

In modern micro‑service architectures, traditional distributed tracing often captures only coarse‑grained request start/end times, leaving many performance problems undetected. Inferred spans (also called "Inferred Spans") are presented as a novel observability technology that combines stack‑trace analysis with existing trace data to automatically create new spans, thereby extending trace coverage and precision.

Distributed Tracing Overview

Conventional tracing records explicit spans and context propagation across services, but it typically lacks detailed function‑call information, resulting in coarse‑grained visibility and missed latency sources.

Principle of Inferred Spans

Inferred spans are generated by fusing stack‑trace data collected via async‑profiler (which provides low‑overhead wall‑clock timing) with detection‑based trace data. An interleaving algorithm merges the two data streams, creates parent‑child relationships, and produces additional spans named by class and method (e.g., Class#method).

Demo Application

A Java demo named inferred-demo contains a queryOrder endpoint that calls Redis and includes an artificial delay. Using the standard JavaAgent configuration, the trace shows only the endpoint and Redis calls, with the total request time (6.02 ms) but no insight into where most latency occurs.

Enabling Inferred Spans

By adding the option otel.inferred.spans.enabled to the APM agent’s configuration, the inferred‑spans feature is activated. The resulting trace includes automatically generated spans for internal methods such as ApiServlet#handleDelay, revealing that the artificial delay accounts for 4.22 s of the total 4.55 s request time.

Data Collection and Interleaving Algorithm

Regular trace data: explicit spans recorded by the application.

Inferred span data: wall‑clock timings collected by async‑profiler.

The interleaving algorithm aligns timestamps from both sources, merges them into a unified span hierarchy, and records activation/deactivation timestamps and thread IDs for each generated span.

Technical Performance Challenges

The feature relies on async‑profiler, whose overhead is minimal but still affected by sampling interval and trace‑sampling rate. Longer intervals reduce overhead but may miss short‑lived methods; a 50 % sampling rate can halve analysis load while keeping useful visibility.

Conclusion

Inferred spans significantly enhance observability for distributed applications, enabling developers and operators to locate root causes of latency more accurately, accelerate troubleshooting, and improve system stability. When properly tuned, the technique offers a cost‑effective solution with controllable performance impact.

References

[1] Revealing unknowns in your tracing data with inferred spans in OpenTelemetry

[2] Special cases for spans and traces in Splunk APM

Java distributed tracing async-profiler inferred spans

Written by

AsiaInfo Technology: New Tech Exploration

AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.