Operations 17 min read

Unveiling Distributed Tracing: How SkyWalking Tackles Microservice Performance

This article explains the principles and benefits of distributed tracing, introduces OpenTracing and SkyWalking architecture, and shares practical implementations and performance comparisons that help identify bottlenecks in microservice systems.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Unveiling Distributed Tracing: How SkyWalking Tackles Microservice Performance

Introduction

In a micro‑service architecture a single request often traverses multiple modules, middleware, and machines. Determining which applications, modules, and nodes are involved, as well as the order of calls, is essential for locating performance problems.

Principles and Role of Distributed Tracing

Three key metrics are used to evaluate an interface: response time (RT), abnormal responses, and the main source of latency.

Monolithic Architecture

In early stages many companies use a monolith. The simplest way to collect the three metrics is to use AOP to log timestamps before and after business logic and to catch exceptions.

Microservice Architecture

When services are split across machines, monitoring becomes harder. A request may follow a chain such as A → C → B → D, each with several instances. Without precise tracing, locating the problematic module is difficult.

Distributed tracing addresses these challenges by automatically collecting data, constructing a complete call chain, and visualizing component performance.

OpenTracing Standard

OpenTracing provides a vendor‑agnostic API that sits between applications/libraries and tracing systems, enabling interchangeable implementations.

Its data model consists of:

Trace : a complete request chain.

Span : a single call with start and end timestamps.

SpanContext : global context (e.g., traceId) propagated across services.

SkyWalking Architecture and Design

Automatic Span Collection

SkyWalking uses a plug‑in + javaagent approach to collect spans without code intrusion, offering extensibility through plug‑ins.

Cross‑Process Context Propagation

Context is transmitted via message headers (e.g., Dubbo attachment) rather than the body, ensuring transparent propagation.

Ensuring Global Unique traceId

SkyWalking generates IDs locally using the Snowflake algorithm. To handle clock rollback, it records the last timestamp; if the current time is earlier, a random number is used as the traceId.

Sampling Strategy

Collecting every request would overwhelm the system. SkyWalking samples three times per three‑second window by default, but forces downstream services to sample if the upstream request was sampled, guaranteeing a complete trace.

Performance Evaluation

Benchmarks at 5000 TPS show negligible CPU, memory, and latency overhead when using SkyWalking. Compared with Zipkin (117 ms) and Pinpoint (201 ms), SkyWalking achieves ~22 ms response time.

SkyWalking also offers multi‑language support, rich plug‑in ecosystem, and low intrusion.

Company Practices with SkyWalking

Agent‑Only Deployment

Our team uses only the SkyWalking agent for sampling, keeping existing monitoring solutions for storage and visualization to avoid migration costs.

Custom Enhancements

Force sampling in pre‑release environments via a cookie flag.

Fine‑grained group sampling for Redis, Dubbo, MySQL, etc., within each three‑second window.

Embedding traceId into log4j output through a custom plug‑in.

Developed proprietary plug‑ins for Memcached and Druid following SkyWalking specifications.

Plug‑in Development Example

A plug‑in consists of a definition class, instrumentation (pointcut), and interceptor (logic before/after method execution). For the Dubbo plugin, the interceptor injects the global traceId into the invocation attachment before the business method runs.

// skywalking-plugin.def file
dubbo=org.apache.skywalking.apm.plugin.asf.dubbo.DubboInstrumentation

This approach adds no code intrusion while ensuring trace propagation.

Conclusion

The article introduced the fundamentals of distributed tracing, explained how SkyWalking implements automatic span collection, context propagation, unique traceId generation, and sampling, and shared practical experiences of applying these techniques in a real‑world microservice environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesPerformance MonitoringOpenTracingDistributed TracingSkyWalking
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.