Inside Alibaba’s Eagleeye & SkyWalking: Distributed Tracing Architecture Explained
This article explores how Alibaba's Eagleeye and the open‑source SkyWalking implement distributed tracing, covering background challenges, Dapper concepts, design goals, data models (Trace, Segment, Span), unique ID schemes, instrumentation techniques, data collection, storage mechanisms, and transmission strategies.
Background
Large monolithic systems struggle to meet performance demands as business volume grows; splitting them into multiple inter‑dependent microservices improves throughput but creates complex cross‑service call chains that need to be observed and diagnosed.
Dapper Overview
Google's 2010 Dapper paper introduced two fundamental requirements for a reliable tracing system: wide coverage of all services and continuous 24/7 monitoring, which later evolved into the OpenTracing specification.
Design Goals
Application‑level transparency – tracing components should be provided as stable libraries so developers need not modify business code.
Low overhead – tracing must add minimal latency and resource consumption, both in CPU (method enhancement) and network/storage.
Extensibility and openness – the system should support many middleware and allow custom plugins for special scenarios.
Data Model
OpenTracing defines three concepts:
Trace : the whole call chain, spanning processes and threads.
Segment : a collection of spans within a single JVM or thread.
Span : an individual operation. Types include Entry Span (chain entry), Local Span (local method), and Exit Span (chain exit).
These models enable reconstruction of a user request’s execution graph.
Unique ID
Each request receives a globally unique traceId. Eagleeye encodes timestamp, host IP, process ID, and a 4‑digit atomic counter to avoid collisions, e.g., 2022-10-18 10:10:40|11.15.148.83|14031|e|0001.
Relationship Description
Spans form a tree; Eagleeye uses RpcId like 0.1.2.3 where the number of dots indicates depth and the last segment indicates sibling order, allowing the full call tree to be rebuilt from all RpcIds in a trace.
Data Collection
During execution, each segment collects its spans locally. To keep impact low, data is first stored in memory and later sent asynchronously.
Instrumentation Methods
Eagleeye employs direct code hooks provided by middleware, while SkyWalking relies on bytecode enhancement. SkyWalking supports two enhancement approaches:
Attach : dynamically attaches an agent to a running JVM via the Attach API.
Javaagent : specified at JVM startup with -javaagent, allowing transformation before the main method runs.
SkyWalking defines plugin interfaces such as InstanceMethodsAroundInterceptor and uses Byte Buddy to weave bytecode. Example interceptor signature:
<span>public interface InstanceMethodsAroundInterceptor {</span>
<span> void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, MethodInterceptResult result) throws Throwable;</span>
<span> Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Object ret) throws Throwable;</span>
<span> void handleMethodException(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Throwable t);</span>
<span>}</span>SkyWalking also provides a witness mechanism to ensure plugins activate only when required classes or methods exist.
Bytecode Enhancement Libraries
Common libraries include cglib, Javassist, ASM, and Byte Buddy (which won the 2015 Duke’s Choice award). Performance benchmarks show Byte Buddy comparable to raw ASM and faster than cglib.
Storage
Eagleeye uses a concurrent ring buffer to store traces; SkyWalking uses partitioned QueueBuffer with multiple consumer threads. Both aim to minimize contention and memory overhead.
Transmission
SkyWalking supports gRPC and Kafka for sending trace data to the backend, while Eagleeye writes traces to local logs before a separate agent forwards them. SkyWalking also offers a pluggable local storage plugin.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
