Operations 11 min read

Distributed Tracing and Observability: Principles, OpenTracing Standard, and Open‑Source Solutions Comparison

This article explains how microservice complexity drives the need for observability, outlines its three pillars—logging, metrics, and tracing—describes OpenTracing concepts and APIs, and compares major open‑source distributed tracing systems to help engineers choose the right solution for fault localization, performance analysis, and capacity planning.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Distributed Tracing and Observability: Principles, OpenTracing Standard, and Open‑Source Solutions Comparison

With the rise of containerization and micro‑service architectures, services are increasingly developed and deployed via tools such as Docker and Kubernetes , but the resulting inter‑service dependencies make debugging, performance analysis, and capacity planning difficult.

Observability is introduced to address these challenges. Unlike traditional monitoring that focuses on failures, observability emphasizes exposing the internal state of an application so that its throughput, latency, and behavior can be directly observed. It is built on three pillars:

Logging : records discrete events; typically stored in files and aggregated by solutions like ELK. Provides the most detailed view but can be storage‑heavy.

Metrics : aggregates numeric data (e.g., counters, histograms); lightweight and ideal for alerting, with Prometheus as the de‑facto standard.

Tracing : links spans across services to form a directed acyclic graph of a request’s path, preserving timing information while avoiding the noise of raw logs.

Tracing enables fault localization, dependency mapping, performance analysis, capacity planning, and strengthens monitoring when combined with logging and metrics.

The concept of distributed tracing became popular after Google’s Dapper paper, leading to open‑source projects such as Zipkin and later a variety of tracing systems.

OpenTracing Standard defines core concepts:

Trace : a directed acyclic graph of Span objects.

Span : a single timed operation (e.g., RPC, DB call) that may have child spans.

SpanContext : carries trace_id , span_id and baggage items for propagation across process boundaries.

References : describe relationships between spans (e.g., ChildOf , FollowsFrom ).

Propagation is typically done via HTTP headers or message‑queue headers. The OpenTracing API provides Tracer.Inject(...) and Tracer.Extract(...) for this purpose.

1        [Span A]  ←←←(the root span)
2            |
3      +------+------
4      |             |
5  [Span B]      [Span C] ←←←(Span C is child of Span A)
6      |             |
7  [Span D]      +---+-------+
8                  |           |
9              [Span E]    [Span F] >>> [Span G] >>> [Span H]
10                                 ^
11                                 ^
12                                 ^
13                (Span G follows from Span F)

Example of injecting and extracting a SpanContext :

# Below is the injection process on the caller side
span_context = ...
outbound_request = ...
carrier = {}
tracer.inject(span_context, opentracing.Format.HTTP_HEADERS, carrier)
for key, value in carrier:
    outbound_request.headers[key] = escape(value)

# Below is the extraction process on the callee side
inbound_request = ...
carrier = inbound_request.headers
span_context = tracer.extract(opentracing.Format.HTTP_HEADERS, carrier)
span = tracer.start_span("...", child_of=span_context)

Several open‑source tracing solutions are compared (data collected June 2019): Jaeger, Zipkin, Apache SkyWalking, CAT, Pinpoint, Elastic APM. Selection criteria include low performance overhead, minimal code intrusion, and extensibility. Recommendations:

For Java‑centric stacks with low cross‑language needs, consider Apache SkyWalking .

For multi‑language environments and strong tracing focus, Jaeger (compatible with Zipkin protocol) is preferred.

For pure web applications already using ELK, Elastic APM offers a low‑cost entry.

For large‑scale log‑centric collection, CAT is popular in China but lacks OpenTracing support.

Pinpoint provides low‑intrusion APM and tracing for Java/PHP but does not implement OpenTracing.

The article concludes with a summary of references covering Dapper, observability pillars, OpenTracing semantics, and various tracing implementations.

MonitoringCloud NativemicroservicesObservabilityOpenTracingDistributed Tracing
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.