Operations 30 min read

Inside Alibaba’s Eagleeye & SkyWalking: Distributed Tracing Architecture Explained

This article explores how Alibaba's Eagleeye and the open‑source SkyWalking implement distributed tracing, covering background challenges, Dapper concepts, design goals, data models (Trace, Segment, Span), unique ID schemes, instrumentation techniques, data collection, storage mechanisms, and transmission strategies.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Inside Alibaba’s Eagleeye & SkyWalking: Distributed Tracing Architecture Explained

Background

Large monolithic systems struggle to meet performance demands as business volume grows; splitting them into multiple inter‑dependent microservices improves throughput but creates complex cross‑service call chains that need to be observed and diagnosed.

Dapper Overview

Google's 2010 Dapper paper introduced two fundamental requirements for a reliable tracing system: wide coverage of all services and continuous 24/7 monitoring, which later evolved into the OpenTracing specification.

Design Goals

Application‑level transparency – tracing components should be provided as stable libraries so developers need not modify business code.

Low overhead – tracing must add minimal latency and resource consumption, both in CPU (method enhancement) and network/storage.

Extensibility and openness – the system should support many middleware and allow custom plugins for special scenarios.

Data Model

OpenTracing defines three concepts:

Trace : the whole call chain, spanning processes and threads.

Segment : a collection of spans within a single JVM or thread.

Span : an individual operation. Types include Entry Span (chain entry), Local Span (local method), and Exit Span (chain exit).

These models enable reconstruction of a user request’s execution graph.

Unique ID

Each request receives a globally unique traceId. Eagleeye encodes timestamp, host IP, process ID, and a 4‑digit atomic counter to avoid collisions, e.g., 2022-10-18 10:10:40|11.15.148.83|14031|e|0001.

Relationship Description

Spans form a tree; Eagleeye uses RpcId like 0.1.2.3 where the number of dots indicates depth and the last segment indicates sibling order, allowing the full call tree to be rebuilt from all RpcIds in a trace.

Data Collection

During execution, each segment collects its spans locally. To keep impact low, data is first stored in memory and later sent asynchronously.

Instrumentation Methods

Eagleeye employs direct code hooks provided by middleware, while SkyWalking relies on bytecode enhancement. SkyWalking supports two enhancement approaches:

Attach : dynamically attaches an agent to a running JVM via the Attach API.

Javaagent : specified at JVM startup with -javaagent, allowing transformation before the main method runs.

SkyWalking defines plugin interfaces such as InstanceMethodsAroundInterceptor and uses Byte Buddy to weave bytecode. Example interceptor signature:

<span>public interface InstanceMethodsAroundInterceptor {</span>
<span>    void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, MethodInterceptResult result) throws Throwable;</span>
<span>    Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Object ret) throws Throwable;</span>
<span>    void handleMethodException(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Throwable t);</span>
<span>}</span>

SkyWalking also provides a witness mechanism to ensure plugins activate only when required classes or methods exist.

Bytecode Enhancement Libraries

Common libraries include cglib, Javassist, ASM, and Byte Buddy (which won the 2015 Duke’s Choice award). Performance benchmarks show Byte Buddy comparable to raw ASM and faster than cglib.

Storage

Eagleeye uses a concurrent ring buffer to store traces; SkyWalking uses partitioned QueueBuffer with multiple consumer threads. Both aim to minimize contention and memory overhead.

Transmission

SkyWalking supports gRPC and Kafka for sending trace data to the backend, while Eagleeye writes traces to local logs before a separate agent forwards them. SkyWalking also offers a pluggable local storage plugin.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaOpenTracingDistributed Tracingbytecode instrumentationSkyWalkingEagleeye
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.