Full-Link Monitoring: Concepts, Architecture, and Comparison of Zipkin, SkyWalking, and Pinpoint
This article explains the fundamentals of full‑link (distributed) monitoring, describes its core components such as spans, traces and annotations, outlines typical system architecture, and provides a detailed performance and feature comparison of three popular APM solutions—Zipkin, SkyWalking, and Pinpoint.
With the rise of micro‑service architectures, a single request often traverses many services, possibly written in different languages and deployed across thousands of servers. To quickly locate and resolve failures, engineers need tools that can observe system behavior and analyze performance problems.
Full‑link monitoring addresses this need; the most well‑known implementation is Google’s Dapper. By tracing cross‑application and cross‑machine interactions, it reconstructs the complete call chain of a request.
A typical request call chain may look like the diagram below (omitted). Such chains raise several challenges:
Rapid problem discovery
Impact scope determination
Service‑dependency analysis
Link‑level performance analysis and capacity planning
Key performance metrics collected during request processing include throughput (TPS), response time, and error counts.
Full‑link monitoring provides end‑to‑end visibility, enabling fast fault localization, visual timing analysis, dependency optimization, and data‑driven capacity planning.
1. Goal Requirements
Probe performance overhead
Code intrusiveness
Scalability
Data analysis capabilities
2. Functional Modules
Instrumentation and log generation
Log collection and storage
Call‑chain data analysis and aggregation
Visualization and decision support
3. Google Dapper Model
A Span is the basic work unit. Each span has a 64‑bit ID, a name, timestamps, annotations, and a parent ID that links it to the preceding span.
type Span struct {
TraceID int64 // identifies the whole request
Name string
ID int64 // current span ID
ParentID int64 // parent span ID, null for root
Annotation []Annotation // timestamped events
Debug bool
}A Trace is a tree of spans representing a complete request lifecycle, identified by a unique TraceID.
Each span can contain multiple Annotation entries, typically four types:
cs – Client Start
sr – Server Receive
ss – Server Send
cr – Client Received
type Annotation struct {
Timestamp int64
Value string
Host Endpoint
Duration int32
}4. Solution Comparison
The three open‑source APM components examined are:
Zipkin (Twitter)
SkyWalking (Chinese open‑source)
Pinpoint (Naver)
Key comparison dimensions include probe performance impact, collector scalability, depth of call‑chain analysis, developer transparency, and topology visualization.
Probe Performance : In load tests (500/750/1000 concurrent users), SkyWalking’s probe had the smallest throughput impact, Zipkin was moderate, while Pinpoint reduced throughput noticeably at 500 users.
Collector Scalability : All three support single‑node and cluster deployments; SkyWalking uses gRPC, Pinpoint uses Thrift over UDP, Zipkin relies on HTTP/JSON.
Call‑Chain Data Analysis : Pinpoint offers the richest detail (method‑level, SQL statements), SkyWalking provides extensive middleware support, Zipkin’s granularity stops at service‑to‑service level.
Developer Transparency : Zipkin often requires code changes; SkyWalking and Pinpoint use byte‑code instrumentation, allowing zero‑code‑change deployment.
Topology Visualization : All three can render full application topology; Pinpoint’s UI shows richer details (e.g., DB names), while Zipkin’s view is limited to service‑level links.
Other Considerations
Community support: Zipkin benefits from a large, active community; Pinpoint’s community is smaller.
Integration cost: Zipkin’s API‑based approach is easier to adopt for new languages, while Pinpoint’s byte‑code agents require deeper knowledge of target frameworks.
Sampling: Both support configurable sampling; Pinpoint’s default is 20 %.
Conclusion
For short‑term needs, Pinpoint excels with non‑intrusive agents, fine‑grained tracing, and powerful UI, but its long‑term maintenance and integration costs are uncertain. Zipkin offers broader language support and a stronger ecosystem, while SkyWalking balances performance impact and feature richness. The final choice should weigh deployment complexity, required tracing granularity, and team expertise.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.