Operations 20 min read

How Distributed Tracing with SkyWalking Solves Microservice Performance Mysteries

This article explains the principles, architecture, and practical implementation of distributed tracing—especially SkyWalking—in microservice environments, showing how it identifies call chains, isolates performance bottlenecks, and integrates with existing monitoring systems while maintaining low overhead and non‑intrusive instrumentation.

Su San Talks Tech

Jan 13, 2023

How Distributed Tracing with SkyWalking Solves Microservice Performance Mysteries

Preface

In a microservice architecture, a single request often involves multiple modules, middleware, and machines. Some calls are serial, some parallel. How can we determine which applications, modules, and nodes are involved and their order, and locate performance issues? This article provides the answers.

The article covers:

Principles and role of distributed tracing systems

SkyWalking's principles and architecture

Our company's practice on distributed call chains

Principles and role of distributed tracing systems

To evaluate an interface's performance, we usually focus on three metrics:

How do you know the interface's RT?

Are there abnormal responses?

Where is the main slowdown?

Monolithic architecture

In early stages, many companies adopt a monolithic architecture. How to calculate the three metrics?

The simplest approach is to use AOP to print timestamps before and after business logic, and to catch exceptions.

Using AOP to record time before and after the business logic calculates overall call time; using AOP to catch exceptions reveals where the error originated.

Microservice architecture

In a monolith, all services run on one machine, making monitoring easier. As business grows, monolith evolves to microservices.

When a user reports a slow page, the request chain might be A → C → B → D, each service having multiple instances. How to know which specific machine handled each call?

Because exact paths are hard to locate, microservices face several pain points:

Difficulty and long cycles in troubleshooting

Hard to reproduce specific scenarios

Challenging performance bottleneck analysis

Distributed tracing addresses these problems by:

Automatically collecting data

Analyzing data to produce a complete call chain, making issues reproducible

Visualizing component performance to locate bottlenecks

Through distributed tracing, each request's exact path can be tracked, enabling performance analysis of each module.

Distributed call chain standard – OpenTracing

To implement a call chain, OpenTracing provides a lightweight, vendor‑agnostic API layer between applications and tracing systems.

OpenTracing is similar to JDBC in Java: a standard interface that implementations can plug into, enabling pluggable components.

OpenTracing’s data model includes:

Trace : a complete request chain

Span : a single call with start and end times

SpanContext : global context information such as traceId

Illustration of these concepts:

A full order request corresponds to a Trace; each call is a Span; the TraceId is passed via SpanContext.

SkyWalking’s principles and architecture design

How to automatically collect span data

SkyWalking uses a plugin‑based javaagent approach to collect span data without code intrusion. Plugins are pluggable and extensible.

How to propagate context across processes

Context should be transmitted in headers, not in the body. In Dubbo, the attachment works like a header, so context is placed there.

Tip: The context propagation is handled by the Dubbo plugin, invisible to business code.

Ensuring globally unique traceId

SkyWalking generates IDs locally using the Snowflake algorithm for high performance.

Snowflake can suffer from clock rollback, potentially causing duplicate IDs. SkyWalking records the last timestamp; if the current time is earlier, it generates a random number as traceId.

Additional validation would add overhead; the probability of collision is low, so extra checks are unnecessary.

Impact of tracing on performance

Collecting every request would generate huge data. SkyWalking uses sampling: default 3 samples per 3 seconds. If upstream sampling occurs, downstream forces collection to keep the chain complete.

In production, calls are not synchronized, so sampling may miss some components; forced collection downstream solves this.

SkyWalking’s basic architecture

Data is periodically sampled, reported, and stored in ES, MySQL, etc., enabling visualization.

SkyWalking performance

Benchmarks at 5000 TPS show negligible CPU, memory, and latency overhead compared to no tracing.

Compared with Zipkin and Pinpoint (response times 117 ms and 201 ms), SkyWalking achieves 22 ms, demonstrating superior performance.

SkyWalking also has low code intrusion thanks to javaagent and plugins.

Supports multiple languages (Java, .Net Core, PHP, NodeJS, Go, Lua) and many components (Dubbo, MySQL, etc.)

Extensible: custom plugins can be written without code intrusion

Our company’s practice on distributed call chains

SkyWalking in our architecture

We only use SkyWalking’s agent for sampling, not the data reporting, storage, or visualization components, because our existing monitoring system (Marvin) already satisfies most needs.

This illustrates that the best solution is the one that fits the current business scenario.

Our modifications and practices

Force sampling in pre‑release environment for debugging

Fine‑grained sampling per component (Redis, Dubbo, MySQL, etc.)

Embedding traceId into logs via Log4j custom plugin

Developed custom SkyWalking plugins for Memcached and Druid

Force sampling in pre‑release

We add a cookie flag force_flag=true that the gateway propagates to Dubbo attachment, prompting the SkyWalking Dubbo plugin to force sampling.

Fine‑grained sampling

Default sampling may miss non‑Dubbo calls; we implemented grouped sampling to ensure each type gets sampled within the 3‑second window.

Embedding traceId in logs

Using Log4j’s plugin mechanism, we define a placeholder %traceId and implement a converter that injects the traceId into log messages.

Custom plugins

We built plugins for Memcached and Druid following SkyWalking’s plugin definition, instrumentation, and interceptor pattern.

Plugins consist of a definition class, instrumentation specifying the target class and method, and an interceptor that adds logic before/after the method.

Define plugin class

Specify instrumentation (pointcut)

Implement interceptor (beforeMethod, afterMethod, etc.)

For example, the Dubbo plugin enhances MonitorFilter’s invoke method to inject the global traceId into the invocation attachment before business logic runs.

Finally, the plugin is declared in skywalking-plugin.def:

// skywalking-plugin.def file
dubbo=org.apache.skywalking.apm.plugin.asf.dubbo.DubboInstrumentation

This results in a silent, non‑intrusive enhancement of the code.

Conclusion

The article introduced the principles of distributed tracing and demonstrated how SkyWalking works. It emphasized choosing technology that fits the existing architecture, as there is no universally best solution, only the most suitable one.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability performance monitoring Distributed Tracing JavaAgent SkyWalking

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.