Operations 10 min read

Distributed Tracing Overview and SkyWalking Architecture

This article explains the fundamentals of distributed tracing, introduces the Dapper and OpenTracing models, and details SkyWalking's data collection, cross‑process propagation, bytecode enhancement, architecture components, monitoring, alerting, and performance characteristics for microservice environments.

政采云技术
政采云技术
政采云技术
Distributed Tracing Overview and SkyWalking Architecture

As microservice architectures grow, a single business request often spans multiple services, making rapid fault diagnosis essential; distributed tracing reconstructs the full request path to locate issues, visualize performance, and analyze behavior.

The original Dapper model, described in Google's paper, uses a globally unique traceId to link spans across services, enabling reconstruction of call relationships and system metrics.

OpenTracing provides a vendor‑neutral API with three core concepts: Trace (a directed acyclic graph of spans), Span (a timed logical operation such as a DB call or RPC), and SpanContext (carrying traceId, spanId, and baggage across process boundaries).

SkyWalking implements tracing by defining three SpanTypes— Entry , Local , and Exit —and uses plugins and agents to collect data non‑intrusively, reporting via HTTP or gRPC.

Cross‑process data is transmitted in headers (e.g., HTTP headers or Dubbo attachments) rather than bodies, ensuring that tracing context travels with the request without affecting payload.

Bytecode enhancement is achieved through JVMTI and Java‑agent mechanisms; SkyWalking agents load plugins at JVM startup (via -javaagent ) to instrument code, create spans, and propagate context.

The architecture consists of data collection (agents), transmission (Kafka, gRPC, HTTP), analysis (OAP server), storage (e.g., Elasticsearch), a GraphQL‑based UI built with Vue, and alerting integrations such as DingTalk.

Monitoring features include service alarm dashboards, dependency graphs, trace detail views, and JVM metric panels, while alerting is configured via alarm-settings.yml with rules, thresholds, periods, and webhook notifications.

rules:
  service_cpm_rule:
    metrics-name: service
    op: ">"
    threshold: 1
    period: 1
    count: 1
    silence-period: 1
    message: service请求值过多
  dingtalkHooks:
    textTemplate: |- 
      {
        "msgtype": "text",
        "text": {"content": "Apache SkyWalking Alarm: \n %s."}
      }
    webhooks:
      - url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxx

Performance testing of the SkyWalking Java agent on a 4‑core i5 machine shows it can handle over 5,000 trace segments per second with low CPU usage and minimal response latency, demonstrating excellent efficiency.

In summary, SkyWalking offers a lightweight, non‑intrusive tracing solution with strong extensibility, making distributed monitoring and rapid fault isolation straightforward for Java microservice applications.

MonitoringPerformancemicroservicesOpenTracingDistributed TracingSkyWalking
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.