Understanding Distributed Tracing and Its Use at Liulishuo

This article explains what distributed tracing is, why it is needed alongside logging and metrics for observability, how it works with trace and span IDs, and describes Liulishuo's implementation using OpenTelemetry, W3C Trace Context, and tail‑based sampling to improve backend debugging.

Liulishuo Tech Team
Liulishuo Tech Team
Liulishuo Tech Team
Understanding Distributed Tracing and Its Use at Liulishuo

1. What is Distributed Tracing?

According to the OpenTracing definition, distributed tracing (also called distributed request tracing) is a method used to profile and monitor applications, especially those built with a micro‑services architecture, helping pinpoint failures and performance problems.

In simple terms, it is a technique for troubleshooting application issues, particularly in distributed systems.

2. Why Do We Need Distributed Tracing?

Developers often rely on logging and metrics, but distributed tracing is needed to improve observability – the ability to answer runtime problems. Greater observability means being able to answer more operational questions.

Examples of questions:

If a request is slow, where is the bottleneck?

If a request fails, where did the error occur?

When an error happens, whose component is responsible?

Logging, metrics, and distributed tracing are the three pillars of observability; the following sections compare them.

2.1 Metrics

Metrics can tell you that something bad happened (e.g., high error rate or resource usage) but cannot explain why or how to fix it. Metrics aggregate data and lack request‑level context, making it hard to trace errors for individual requests.

The advantage of metrics is low cost and no need for sampling, which yields accurate data, but they provide limited insight for context‑dependent debugging.

2.2 Logging

Logs provide detailed runtime information and can include a unique request ID, allowing extraction of a single request’s context. However, in distributed and highly concurrent environments, correlating logs across services becomes difficult.

Developers must use additional tools to query logs globally, filter by request ID, and still may struggle to reconstruct the execution path because logs are linear and lack explicit relationships.

2.3 Distributed Tracing

To solve missing context and relationship issues, tracing assigns a globally unique TraceID to a request and propagates it across all services (metadata propagation). Each operation is recorded as a Span with its own SpanID and a ParentID, forming a tree of spans.

Spans contain timestamps, status, and additional metadata, allowing calculation of latency between operations and identification of error locations.

With this data, the three earlier questions can be answered:

Identify the slowest Span to locate bottlenecks.

Inspect Spans that contain errors to find failure points.

Use Span metadata to determine which service is responsible.

3. Application at Liulishuo

Liulishuo uses OpenTelemetry as its tracing SDK and adopts the W3C Trace Context standard. To ensure compatibility with third‑party services, the SDK also supports the B3 specification.

W3C Trace Context defines standard HTTP headers for propagating trace metadata, solving format‑compatibility issues.

OpenTelemetry, a merger of OpenCensus and OpenTracing, aims to support the three observability pillars. While only tracing is production‑ready today, its language‑agnostic specifications ensure consistent developer experience across services.

Because tracing can generate large volumes of data, Liulishuo disables head‑based sampling and instead uses tail‑based sampling: all traces are collected, then a processing service decides which traces to retain (e.g., those with errors or high latency) based on configurable rules, reducing storage costs while preserving valuable data.

4. References

https://opentelemetry.io

https://opentracing.io/docs/overview/what-is-tracing

https://www.w3.org/TR/trace-context

https://github.com/open-telemetry/opentelemetry-specification

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

microservicesobservabilityOpenTelemetryDistributed Tracing
Liulishuo Tech Team
Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.