Cloud Native 24 min read

From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

This article explains the challenges of long request chains in micro‑service architectures, reviews Google’s Dapper tracing requirements, introduces OpenTracing and OpenCensus standards, compares their strengths, and details how OpenTelemetry unifies tracing, metrics and logs with practical integration steps and best‑practice guidance.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

Micro‑service and serverless applications often suffer from excessively long request chains, making problem diagnosis and performance monitoring difficult; a single service failure can dramatically affect user experience.

Google’s Dapper paper ( "Dapper – a Large‑Scale Distributed Systems Tracing Infrastructure" ) defines four essential requirements for a distributed tracing system: minimal performance overhead, low intrusion, rapid scalability, and real‑time data collection.

OpenTracing

OpenTracing was created to provide a vendor‑agnostic, low‑intrusion API that standardizes trace data collection across languages and frameworks. By implementing a common set of APIs, developers can switch tracing back‑ends (e.g., Jaeger, Zipkin, LightStep) simply by changing the tracer implementation. Core concepts include:

Backend‑agnostic API : services call standardized APIs, enabling any compliant backend to consume the data.

Span management : start/end spans, record duration, and attach tags.

Inter‑process propagation : defines how trace context is passed between processes.

Multi‑language support : Go, Python, JavaScript, Java, C#, Objective‑C, C++, Ruby, PHP, etc.

OpenTracing became a CNCF project in 2016, highlighting its importance in the cloud‑native ecosystem.

OpenCensus

OpenCensus extends OpenTracing by adding built‑in metrics collection and a unified data model. Its architecture consists of an Agent (a daemon that runs alongside applications) and a Collector that receives data from any language via Exporters . Core terminology includes:

Tags : key‑value pairs attached to metrics.

Stats : aggregated measurements with views.

Trace : spans with additional fields such as parent span ID, attributes, annotations, message events, and links.

OpenCensus supports a wide range of languages (Go, Java, Python, etc.) and can export to Prometheus, Jaeger, Zipkin, Stackdriver, and others.

OpenTelemetry

OpenTelemetry merges OpenTracing and OpenCensus into a single CNCF incubating project, providing a unified specification, API, SDK, and tooling for traces, metrics, and logs. It defines the OpenTelemetry Line Protocol (OTLP) as the native data format and offers language‑specific SDKs for C++, .NET, Go, Java, JavaScript, PHP, Python, Ruby, Rust, Swift, and more.

Key data types:

Metrics : counters, measures, and observers.

Logs : timestamped text or structured records, optionally attached to spans.

Traces : a tree of spans representing a single request, identified by a TraceID.

Baggage : key‑value pairs propagated alongside trace context.

The OpenTelemetry Collector can run in Agent mode (sidecar or daemonset) or Gateway mode (stand‑alone service), ingesting data via OTLP, Jaeger, Prometheus, etc., and exporting to any backend.

Integration with Alibaba Cloud ARMS

ARMS (Application Real‑Time Monitoring Service) offers three ways to ingest OpenTelemetry trace data:

Direct reporting : use the ARMS Java Agent, which automatically instruments common libraries and emits OpenTelemetry‑compatible traces.

Collector forwarding : deploy the ARMS‑for‑OpenTelemetry Collector in an ACK cluster, then configure your SDK’s exporter endpoint to point to otel-collector-service:Port.

OpenTelemetry Collector forwarding : modify the OpenTelemetry Collector’s exporters.otlp configuration, replacing <endpoint> and <token> with your ARMS endpoint and token.

exporters:
  otlp:
    endpoint: <endpoint>:8090
    tls:
      insecure: true
    headers:
      Authentication: <token>

After deployment, ARMS provides trace detail panels, pre‑aggregated dashboards, a Trace Explorer for post‑aggregation analysis, and the ability to correlate traces with business logs.

Conclusion

OpenTelemetry unifies the collection of traces, metrics, and logs under a single, vendor‑neutral standard, simplifying instrumentation across languages and platforms. While it solves the data‑generation problem, choosing storage, analysis, and alerting solutions (e.g., Prometheus, Jaeger, custom AIOps platforms) remains a separate architectural decision.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeObservabilityMetricsOpenTracingDistributed TracinglogsOpenCensus
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.