Cloud Native 11 min read

How OpenTelemetry and Jaeger Power Cloud‑Native Tracing

This article explains cloud‑native observability, defines its three pillars—metrics, tracing, and logging—details the OpenTelemetry tracing data model and Span structure, reviews industry implementations such as Jaeger and Alibaba Eagle Eye, and shares practical challenges and solutions from real‑world production use.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How OpenTelemetry and Jaeger Power Cloud‑Native Tracing

Concept Introduction

Observability in cloud‑native systems is the ability to infer internal state from external outputs. The three foundational pillars are Metrics, Tracing, and Logging.

Metrics : Aggregatable atomic values such as counters or histograms (e.g., number of incoming HTTP requests).

Tracing : Captures request‑scoped data and metadata (e.g., the actual SQL query sent to a database).

Logging : Handles discrete events such as debug or error messages, typically written to split‑file streams for cluster‑wide processing.

Metrics consume the least resources because they are highly compressible; logging can dominate traffic volume, while tracing falls between the two in overhead.

Tracing Data Model (OpenTelemetry Example)

OpenTelemetry defines a trace as a directed acyclic graph (DAG) of Span objects. Each Span encapsulates the following state:

Name

Start and End Timestamps

Span Context

Two identifiers: Trace ID (identifies the overall trace) and Span ID (uniquely identifies the span within the trace).

Attributes

Key‑value metadata that annotates the span with additional information about the operation.

Span Events

Structured log‑like messages representing meaningful points in time within the span.

Span Links

Associations to one or more other spans, describing upstream/downstream relationships, useful for asynchronous workflows.

Span Status

Status code indicating the outcome of the operation.

Industry Tracing Implementations

Uber Jaeger

Jaeger is an open‑source cloud‑native tracing platform (CNCF graduated 2017) that fully supports the OpenTelemetry standard.

Jaeger’s architecture consists of the following components: jaeger-client: SDK that collects spans, supports dynamic traffic simulation, and is aware of storage pressure. jaeger-agent: Enforces sampling policies. jaeger-collector: Aggregates, processes, and stores tracing data. jaeger-query and jaeger-ui: Provide query capabilities and a user interface.

Jaeger integrates with middleware instrumentation, supports multiple protocols (e.g., HTTP), and can store data in Cassandra, Elasticsearch, or other open‑source back‑ends.

Official site: https://www.jaegertracing.io/

Alibaba Eagle Eye

Eagle Eye is Alibaba’s log‑based distributed tracing system built for high‑traffic events such as Double‑11. It addresses fault localization, capacity estimation, and resource waste by providing real‑time link analysis and visualized monitoring.

Key characteristics:

Lightweight architecture with real‑time streaming data presentation.

Visualized monitoring pipelines that lower integration cost for developers.

Selective sampling based on analysis scenarios to reduce data volume.

The platform supports HTTP/TCP protocols, middleware or bytecode‑enhanced instrumentation, and stores data in HDFS, HBase, HStore, or MPP databases.

Practical Challenges and Solutions (Baidu Experience)

Large‑scale tracing in production faces several difficulties:

High data volume : Requires high‑performance SDKs, efficient sampling strategies, optimized encoding/mapping algorithms, and tiered storage based on data type and usage.

Low integration cost : SDKs must be easy to adopt, with simple APIs and minimal developer effort; automatic instrumentation should cover most use‑cases, while custom hooks remain straightforward.

Stability requirements : Use local persistence as a buffer, combine tracing traffic with background tasks, and implement robust retry mechanisms.

Advanced feature demands : Include confidence analysis of metrics, real‑time multi‑window aggregation for short‑term and long‑term trends, and concise visualizations that convey maximum information with minimal indicators.

References

OpenTelemetry – Spans: https://opentelemetry.io/docs/concepts/signals/traces/#spans-in-opentelemetry

Benjamin H. Sigelman, Luiz André Barroso et al., “Dapper: A Large‑Scale Distributed Systems Tracing Infrastructure”, 2010.

Uber Jaeger engineering blog: https://eng.uber.com/distributed-tracing/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsCloud NativeObservabilityOpenTelemetrytracingjaegerAlibaba Eagle Eye
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.