Operations 27 min read

Which APM Tool Wins? A Deep Comparison of Zipkin, SkyWalking, and Pinpoint

This article analyzes full‑link monitoring in micro‑service architectures, outlines the goals and functional modules of tracing systems, explains core concepts such as Span, Trace, and Annotation, and then compares Zipkin, SkyWalking, and Pinpoint across performance impact, scalability, data analysis depth, developer transparency, and topology visualization.

IT Architects Alliance

Sep 23, 2022

Which APM Tool Wins? A Deep Comparison of Zipkin, SkyWalking, and Pinpoint

With micro‑service architectures becoming mainstream, a single request often traverses many services, possibly written in different languages and deployed across thousands of servers. To quickly locate and resolve failures, full‑link monitoring tools—originally inspired by Google’s Dapper paper—are required.

1. Goals and Requirements

The tracing component should have minimal performance overhead, be non‑intrusive to application code, scale horizontally, provide fast data analysis, and support rich dependency metrics.

2. Functional Modules

Typical full‑link monitoring systems consist of four modules:

Instrumentation and log generation (client/server or bidirectional).

Log collection and storage (often using a message queue as a buffer).

Analysis and aggregation of trace data (real‑time and offline).

Visualization and decision‑support dashboards.

3. Core Concepts from Google Dapper

3.1 Span

A Span represents a single unit of work in a trace and is identified by a 64‑bit ID. It contains fields such as TraceID, SpanID, ParentID, name, timestamps, annotations, and optional debug flags.

type Span struct {

TraceID    int64 // identifies the whole request

Name       string

ID         int64 // current span ID

ParentID   int64 // parent span ID, null for root

Annotation []Annotation // timestamps and events

Debug      bool

3.2 Trace

A Trace is a tree of Spans that together represent the complete request flow from client request to final response, identified by a unique TraceID.

3.3 Annotation

Annotations record specific events within a Span, typically four types: cs (Client Start), sr (Server Receive), ss (Server Send), and cr (Client Received).

type Annotation struct {

Timestamp int64

Value     string

Host      Endpoint

Duration  int32

3.4 Call Example

When a user request reaches front‑end service A, it may invoke services B and C via RPC. Service B returns immediately, while C further calls D and E before responding. The entire flow is captured by a global TraceID and a hierarchy of SpanIDs.

4. Deployment Architecture

Agents can be deployed without code changes. Two main agent types exist:

In‑process Java agents that instrument methods via the JVM’s javaagent mechanism.

Cross‑service agents that provide plugins for popular RPC frameworks (Dubbo, REST, custom RPC).

Supported plugins include:

Dubbo

REST

Custom RPC

5. Benefits of Full‑Link Monitoring

Accurate visibility of production deployments.

Identification and optimization of critical call paths.

Quantifiable performance data for IT operations.

Rapid pinpointing of code‑level performance issues.

Support for white‑box testing and reduced time‑to‑stability.

6. Solution Comparison

The three open‑source APM solutions examined are Zipkin (Twitter), Pinpoint (Naver), and SkyWalking (Apache). The comparison focuses on five dimensions:

Probe performance impact.

Collector scalability.

Depth of call‑chain data analysis.

Developer transparency and ease of enable/disable.

Automatic topology discovery.

6.1 Probe Performance

Using a Spring‑based benchmark (Spring Boot, MVC, Redis, MySQL) with JMeter at 500, 750, and 1000 concurrent users, the throughput impact was measured. SkyWalking showed the smallest throughput loss, Zipkin was moderate, while Pinpoint reduced throughput significantly (e.g., from 1385 TPS to 774 TPS at 500 users). CPU and memory overhead stayed around 10 % for all three.

6.2 Collector Scalability

All three support horizontal scaling. Zipkin can run multiple server instances consuming messages from a queue. SkyWalking’s collector works in single‑node or cluster mode via gRPC. Pinpoint uses Thrift and also supports clustered deployment.

6.3 Data Analysis Depth

Zipkin provides service‑level latency but lacks fine‑grained method details. SkyWalking captures >20 middleware/frameworks (Dubbo, OkHttp, DB, MQ) and shows richer call graphs. Pinpoint records the most detailed data, including SQL statements and method‑level spans, offering the deepest visibility.

6.4 Developer Transparency

Zipkin requires code changes or library integration (Brave). SkyWalking and Pinpoint rely on byte‑code instrumentation, so no source modifications are needed. Pinpoint’s Java agent is completely non‑intrusive, while Zipkin’s approach can be more invasive.

6.5 Topology Visualization

All three generate service‑level topology maps. Pinpoint’s UI shows detailed DB and method nodes, Zipkin’s view is limited to service‑to‑service links, and SkyWalking offers a middle ground with extensive middleware support.

6.6 Pinpoint vs. Zipkin Detailed Comparison

Pinpoint provides a full APM stack (probe, collector, storage, UI) whereas Zipkin focuses on collection and storage with a lighter UI. Pinpoint uses Java agents for byte‑code injection, offering deeper data (additional SpanEvent layer) but requires more expertise to develop custom plugins. Zipkin’s Brave library offers a simpler API and broader language support but needs explicit code integration.

7. Tracing vs. Monitoring

Monitoring collects system‑level metrics (CPU, memory, network) and application‑level metrics (QPS, latency, error rates) to detect anomalies. Tracing builds on monitoring by capturing the full call chain, enabling root‑cause analysis before incidents become visible.

8. Conclusion

In the short term, Pinpoint excels with zero‑code deployment, method‑level granularity, and a powerful UI. However, its ecosystem is smaller, its storage relies on HBase, and extending it to new frameworks can be costly. Zipkin benefits from a large community, simple REST/JSON interfaces, and easier integration, though it provides coarser data. SkyWalking offers a balanced solution with moderate overhead, broad middleware support, and good scalability. Teams should choose based on required granularity, existing technology stack, and long‑term maintenance considerations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices APM Observability performance monitoring Distributed Tracing Comparison

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.