Operations 13 min read

Mastering Jaeger: A Complete Guide to Distributed Tracing and Deployment

Jaeger is an open‑source, CNCF‑graduated distributed tracing system built by Uber, and this guide explains its core concepts, architecture, sampling strategies, and various deployment options—including all‑in‑one, Kubernetes, and OpenTelemetry—plus how it compares with other tracing tools.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Jaeger: A Complete Guide to Distributed Tracing and Deployment

1. Introduction

1.1 What is Jaeger

Jaeger is an open‑source distributed tracing system inspired by Dapper and OpenZipkin, released by Uber Technologies and now a CNCF project. Its frontend is implemented with React, the backend with Go, and it is used for link tracing, distributed transaction monitoring, performance analysis, service‑dependency analysis, and optimization.

Jaeger mainly consists of three parts:

Tracer : A trace represents the execution of a transaction or workflow across a distributed system. It is a directed acyclic graph (DAG) of spans, each span having a unique Span ID and the trace having a unique Trace ID. The Trace ID is propagated from service to service (e.g., a→b→c→d).

Span : The logical work unit of tracing, which can be a service, a method call, or even a simple code block. Each span records a name, start time, and duration, and spans are nested to show relationships between services.

Span Context : A data structure that carries additional trace information such as Trace ID, Span ID, and any other data that needs to be passed downstream.

In short, a trace represents the full call chain of a request, while a span represents the interaction between two services; a trace can be seen as a directed graph of spans.

1.2 Distributed Tracing Terminology

APM

With the rise of micro‑service architectures, a single request often involves many services, making performance monitoring and troubleshooting more complex:

Different services may be developed by different teams and written in different programming languages.

Services may be spread across thousands of servers in multiple data centers.

Therefore, Application Performance Management (APM) tools are needed to understand system behavior and quickly locate problems. The concept originated from Google’s Dapper paper.

Tracing

In monolithic applications, call stacks can be used to debug issues. In distributed systems, a request may trigger many networked service calls, making debugging difficult. Jaeger can be viewed as a distributed call‑stack, recording all interactions of a request for analysis.

OpenTracing

OpenTracing is a lightweight standardization layer that sits between applications/libraries and tracing or logging systems, providing a vendor‑agnostic API.

In short, OpenTracing offers a standard API that allows developers to add or swap tracing implementations easily; it has been adopted by CNCF.

Note: OpenTracing has been superseded by OpenTelemetry, which now governs the APM landscape.

1.3 Comparison of Jaeger with Other Tracing Tools

Many distributed tracing tools exist, such as Uber’s Jaeger, Twitter’s Zipkin, and Apache SkyWalking. The following diagram compares their capabilities across multiple dimensions.

2. Jaeger Architecture Design

2.1 Jaeger Architecture

Jaeger consists of the following components:

Tracing SDK : Language‑specific libraries that instrument applications to emit trace data.

Jaeger Collector : Receives traces, validates and enriches them, and stores them in the backend storage.

DB : Backend storage, supporting in‑memory, Cassandra, Elasticsearch, or Kafka.

Jaeger Query : Handles query requests, retrieves data from storage, and serves it to the UI.

Jaeger UI : A React‑based web interface for visualizing traces.

In Jaeger’s design, the Collector receives data from instrumented applications and writes it directly to storage. The storage must handle both average and peak traffic; the Collector uses an in‑memory queue to smooth short‑term spikes, but prolonged peaks can cause data loss if the storage cannot keep up.

Note: The diagram shows the latest stable version 1.47. Older versions included a jaeger‑agent component, which is now deprecated in favor of OpenTelemetry SDKs.

2.2 Jaeger Sampling Rate

Tracing every request can impose performance overhead, so sampling is typically used.

Four sampling strategies are currently supported:

Fixed sampling (sampler.type=const): param=1 for full sampling, param=0 for none.

Probabilistic sampling (sampler.type=probabilistic): param=0.1 samples 10% of traces randomly.

Rate‑limiting sampling (sampler.type=ratelimiting): param=2.0 samples two traces per second.

Remote sampling (sampler.type=remote): dynamically decides sampling based on external conditions.

3. Jaeger Deployment Methods

Jaeger can be deployed in several ways:

All‑in‑one : Quick demo deployment with in‑memory storage; not suitable for production.

Kubernetes : Deploy each Jaeger component as separate manifests; highly configurable and can integrate with existing Elasticsearch or Kafka services.

OpenTelemetry : Deploy using the OpenTelemetry API.

3.1 Deploy Jaeger as an Istio Component in a Kubernetes Cluster

Istio does not enable Jaeger by default; it must be installed manually.

(1) Modify Istio config to set trace sampling rate

The sampling configuration can be set to trace all, part, or random requests.

kubectl -n istio-system get cm jaeger-sampling-configuration -o yaml
  ...
  sampling: '{"default_strategy":{"param":1,"type":"probabilistic"}}'  # 100% random sampling

(2) Deploy Jaeger

Run kubectl apply -f samples/addons/jaeger.yaml to install Jaeger in the istio-system namespace.

[root@106 ~]# kubectl get pods -n=istio-system |grep jaeger
jaeger-collector-85b686d849-cmv9h        1/1     Running     0          99d
jaeger-operator-868d5f975d-5prhx         1/1     Running     0          27d
jaeger-query-7cff7c84f4-k7bs8            2/2     Running     0          167m
Note: It is recommended to use the jaeger-operator for deploying Jaeger components.

(3) Access Jaeger Dashboard

Find the service’s NodePort (e.g., 30693) and open http://<IP>:30693 in a browser.

# kubectl get svc -n=istio-system |grep query
jaeger-query                NodePort       10.233.41.95    <none>        16686:30693/TCP,16685:30363/TCP              500d

Original article: https://www.cnblogs.com/zhangmingcheng/p/17602568.html

(Copyright belongs to the original author, please delete if infringed)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ObservabilityKubernetesOpenTelemetryDistributed Tracingjaeger
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.