Backend Development 19 min read

Master Distributed Tracing with SkyWalking: Principles, Architecture & Practices

This article explains the fundamentals of distributed tracing in microservice architectures, details the OpenTracing standard, examines SkyWalking’s design, sampling strategies, context propagation, and plugin development, and shares practical implementation experiences and performance comparisons, helping engineers choose and integrate effective tracing solutions.

Su San Talks Tech

Aug 27, 2025

Master Distributed Tracing with SkyWalking: Principles, Architecture & Practices

Introduction

In a micro‑service architecture a single request often traverses multiple modules, middleware, and machines. Determining which applications, modules, and nodes are involved, as well as their execution order and performance bottlenecks, is essential for troubleshooting.

What the Article Covers

Principles and benefits of distributed tracing systems

SkyWalking’s architecture and design

Our company’s practice on distributed call chains

Principles and Role of Distributed Tracing

Typical performance metrics for an interface include response time (RT), exception detection, and identifying the main source of latency.

Monolithic Architecture

In early stages many companies adopt a monolithic architecture. The simplest way to collect the three metrics is by using AOP to log timestamps before and after business logic execution and to catch exceptions.

Microservice Architecture

As the business grows, monoliths evolve into microservices, introducing multiple services (A, B, C, D) deployed on several machines. Tracing the exact path of a request becomes difficult, leading to three main pain points:

Hard to locate problems, long debugging cycles

Difficult to reproduce specific scenarios

Challenging performance‑bottleneck analysis

Distributed tracing addresses these issues by automatically collecting data, providing a complete call chain, and visualizing component performance.

OpenTracing Standard

OpenTracing offers a lightweight, vendor‑agnostic API layer between applications and tracing systems, enabling developers to add tracing without being tied to a specific implementation.

Its data model consists of three core concepts:

Trace : a complete request chain

Span : a single operation with start and end timestamps

SpanContext : global context (e.g., traceId) propagated across spans

How SkyWalking Solves Common Tracing Challenges

Automatic Span Collection

SkyWalking uses a plugin‑based Java agent to instrument code without source changes, achieving non‑intrusive span collection.

Cross‑Process Context Propagation

Context is transmitted via message headers (e.g., Dubbo attachment) rather than the body, ensuring seamless propagation across services.

Ensuring Globally Unique traceId

SkyWalking generates IDs locally using the Snowflake algorithm. To handle clock rollback, it records the last timestamp and falls back to a random number when the current time is earlier.

Sampling Impact on Performance

SkyWalking samples three times per three‑second window by default. To avoid missing data from other components (e.g., Redis, MySQL), it supports group‑based sampling, ensuring each component type gets sampled.

Performance Evaluation

Benchmarks at 5000 TPS show SkyWalking adds negligible CPU, memory, and latency overhead compared to a baseline. Compared with Zipkin (117 ms) and Pinpoint (201 ms), SkyWalking achieves 22 ms response time.

Key Advantages

Multi‑language support (Java, .NET Core, PHP, NodeJS, Go, Lua) and many components (Dubbo, MySQL, etc.)

Extensible plugin system allowing custom instrumentation without code intrusion

Our Company’s Practice with Distributed Tracing

Using Only SkyWalking Agent

We adopted only the SkyWalking agent for sampling, keeping existing Marvin monitoring for data collection, storage, and visualization to avoid unnecessary replacement costs.

Custom Enhancements

Force sampling in pre‑release environments by adding a force_flag=true cookie, which the gateway propagates via Dubbo attachment.

Fine‑grained group sampling to ensure each component type (Redis, Dubbo, MySQL) gets sampled within the three‑second window.

Embedding traceId into logs using a custom Log4j plugin that defines a %traceId placeholder.

Developing custom plugins for Memcached and Druid, which are not provided by SkyWalking out‑of‑the‑box.

Log4j Plugin Example

// skywalking-plugin.def file
dubbo=org.apache.skywalking.apm.plugin.asf.dubbo.DubboInstrumentation

Plugin Implementation Overview

A SkyWalking plugin consists of three parts: a definition class, instrumentation (specifying the target class and method), and an interceptor (defining before/after logic). For example, enhancing Dubbo’s MonitorFilter.invoke method to inject the global traceId into the invocation’s attachment.

Conclusion

The article provides a deep dive into distributed tracing principles, the role of OpenTracing, SkyWalking’s architecture, sampling strategies, and practical customizations. Selecting the right tracing solution should align with existing architecture and performance requirements—there is no universally best technology, only the most suitable one.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Microservices Observability OpenTracing Distributed Tracing SkyWalking

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.