How Distributed Tracing with SkyWalking Solves Microservice Performance Mysteries
This article explains the principles, architecture, and practical implementation of distributed tracing—especially SkyWalking—in microservice environments, showing how it identifies call chains, isolates performance bottlenecks, and integrates with existing monitoring systems while maintaining low overhead and non‑intrusive instrumentation.
Preface
In a microservice architecture, a single request often involves multiple modules, middleware, and machines. Some calls are serial, some parallel. How can we determine which applications, modules, and nodes are involved and their order, and locate performance issues? This article provides the answers.
The article covers:
Principles and role of distributed tracing systems
SkyWalking's principles and architecture
Our company's practice on distributed call chains
Principles and role of distributed tracing systems
To evaluate an interface's performance, we usually focus on three metrics:
How do you know the interface's RT?
Are there abnormal responses?
Where is the main slowdown?
Monolithic architecture
In early stages, many companies adopt a monolithic architecture. How to calculate the three metrics?
The simplest approach is to use AOP to print timestamps before and after business logic, and to catch exceptions.
Using AOP to record time before and after the business logic calculates overall call time; using AOP to catch exceptions reveals where the error originated.
Microservice architecture
In a monolith, all services run on one machine, making monitoring easier. As business grows, monolith evolves to microservices.
When a user reports a slow page, the request chain might be A → C → B → D, each service having multiple instances. How to know which specific machine handled each call?
Because exact paths are hard to locate, microservices face several pain points:
Difficulty and long cycles in troubleshooting
Hard to reproduce specific scenarios
Challenging performance bottleneck analysis
Distributed tracing addresses these problems by:
Automatically collecting data
Analyzing data to produce a complete call chain, making issues reproducible
Visualizing component performance to locate bottlenecks
Through distributed tracing, each request's exact path can be tracked, enabling performance analysis of each module.
Distributed call chain standard – OpenTracing
To implement a call chain, OpenTracing provides a lightweight, vendor‑agnostic API layer between applications and tracing systems.
OpenTracing is similar to JDBC in Java: a standard interface that implementations can plug into, enabling pluggable components.
OpenTracing’s data model includes:
Trace : a complete request chain
Span : a single call with start and end times
SpanContext : global context information such as traceId
Illustration of these concepts:
A full order request corresponds to a Trace; each call is a Span; the TraceId is passed via SpanContext.
SkyWalking’s principles and architecture design
How to automatically collect span data
SkyWalking uses a plugin‑based javaagent approach to collect span data without code intrusion. Plugins are pluggable and extensible.
How to propagate context across processes
Context should be transmitted in headers, not in the body. In Dubbo, the attachment works like a header, so context is placed there.
Tip: The context propagation is handled by the Dubbo plugin, invisible to business code.
Ensuring globally unique traceId
SkyWalking generates IDs locally using the Snowflake algorithm for high performance.
Snowflake can suffer from clock rollback, potentially causing duplicate IDs. SkyWalking records the last timestamp; if the current time is earlier, it generates a random number as traceId.
Additional validation would add overhead; the probability of collision is low, so extra checks are unnecessary.
Impact of tracing on performance
Collecting every request would generate huge data. SkyWalking uses sampling: default 3 samples per 3 seconds. If upstream sampling occurs, downstream forces collection to keep the chain complete.
In production, calls are not synchronized, so sampling may miss some components; forced collection downstream solves this.
SkyWalking’s basic architecture
Data is periodically sampled, reported, and stored in ES, MySQL, etc., enabling visualization.
SkyWalking performance
Benchmarks at 5000 TPS show negligible CPU, memory, and latency overhead compared to no tracing.
Compared with Zipkin and Pinpoint (response times 117 ms and 201 ms), SkyWalking achieves 22 ms, demonstrating superior performance.
SkyWalking also has low code intrusion thanks to javaagent and plugins.
Supports multiple languages (Java, .Net Core, PHP, NodeJS, Go, Lua) and many components (Dubbo, MySQL, etc.)
Extensible: custom plugins can be written without code intrusion
Our company’s practice on distributed call chains
SkyWalking in our architecture
We only use SkyWalking’s agent for sampling, not the data reporting, storage, or visualization components, because our existing monitoring system (Marvin) already satisfies most needs.
This illustrates that the best solution is the one that fits the current business scenario.
Our modifications and practices
Force sampling in pre‑release environment for debugging
Fine‑grained sampling per component (Redis, Dubbo, MySQL, etc.)
Embedding traceId into logs via Log4j custom plugin
Developed custom SkyWalking plugins for Memcached and Druid
Force sampling in pre‑release
We add a cookie flag force_flag=true that the gateway propagates to Dubbo attachment, prompting the SkyWalking Dubbo plugin to force sampling.
Fine‑grained sampling
Default sampling may miss non‑Dubbo calls; we implemented grouped sampling to ensure each type gets sampled within the 3‑second window.
Embedding traceId in logs
Using Log4j’s plugin mechanism, we define a placeholder %traceId and implement a converter that injects the traceId into log messages.
Custom plugins
We built plugins for Memcached and Druid following SkyWalking’s plugin definition, instrumentation, and interceptor pattern.
Plugins consist of a definition class, instrumentation specifying the target class and method, and an interceptor that adds logic before/after the method.
Define plugin class
Specify instrumentation (pointcut)
Implement interceptor (beforeMethod, afterMethod, etc.)
For example, the Dubbo plugin enhances MonitorFilter’s invoke method to inject the global traceId into the invocation attachment before business logic runs.
Finally, the plugin is declared in skywalking-plugin.def:
// skywalking-plugin.def file
dubbo=org.apache.skywalking.apm.plugin.asf.dubbo.DubboInstrumentationThis results in a silent, non‑intrusive enhancement of the code.
Conclusion
The article introduced the principles of distributed tracing and demonstrated how SkyWalking works. It emphasized choosing technology that fits the existing architecture, as there is no universally best solution, only the most suitable one.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
