How to Build a Low‑Cost Distributed Tracing System for Microservices
This article explains the evolution from a monolithic architecture to microservices, outlines the new pain points such as fault isolation, performance bottlenecks and scaling inefficiencies, and presents a practical, low‑cost distributed tracing solution with unified frameworks, components, configuration management, data collection, and visualization.
1. Architecture before microservices
In a monolithic deployment a single site application directly accesses caches and databases, often clustered for high availability. Debugging relied on adding log statements in the application layer and measuring execution time of a few steps.
2. Pain points after adopting microservices
Fault isolation : Multiple services, clusters and network layers require SSHing into many nodes, checking logs and coordinating across teams.
Performance bottleneck identification : An HTTP request traverses many services, databases and caches, making it hard to pinpoint the slowest component.
Inefficient call patterns : Remote calls placed inside loops cause massive latency and complicate capacity planning.
3. Desired characteristics of a distributed tracing system
Full‑link visibility : Show the complete call chain from the entry HTTP request through every service, database and cache.
Cross‑process tracking : Propagate identifiers across machines and processes.
Full‑traffic collection : Capture every request, not just a sampled subset.
Additional metadata such as request IDs, timestamps, call depth, SQL statements and cache keys.
4. Core tracing challenges
Cross‑process tracing requires three custom fields in the RPC protocol:
Request ID – a globally unique identifier for the whole trace.
Sequence ID – a logical ordering number that does not depend on synchronized clocks.
Depth ID – indicates the call depth to differentiate parallel branches.
Because the RPC framework is self‑developed, these fields can be added directly to the protocol header.
5. Practical implementation
Unified framework : Instrument the entry and exit points of both the site framework and the service (RPC) framework to record timestamps and parameters.
Unified component wrappers : Wrap database and cache clients (e.g., Redis, Memcached) so that a single modification can emit execution time, SQL statements and cache keys.
Unified configuration management :
Stage 1 – Centralized configuration files (e.g., global.conf) to avoid per‑service duplication.
Stage 2 – A shared configuration market that reduces redundancy.
Stage 3 – A full configuration centre that registers services, notifies dependents and drives dynamic connection management.
Data collection :
UDP SDK – a low‑latency fire‑and‑forget reporter that sends trace data to a UDP collector, later persisted to Elasticsearch.
Asynchronous file logging – write locally first, then batch‑push to the collector, minimizing impact on request latency.
Only about ten instrumentation points are needed: request entry/exit in the site and RPC frameworks, send/receive in cache and DB clients, and RPC client/server boundaries.
6. Visualization
The backend renders a timeline view that shows total request time, per‑service breakdown, parameters, SQL statements and cache keys. Heat‑maps highlight the longest‑running nodes, enabling rapid diagnosis of failures, performance hot‑spots and unreasonable call patterns.
7. Benefits
Fast discovery of online issues.
Quick pinpointing of performance bottlenecks.
Immediate identification of unreasonable service calls (e.g., calls hidden inside loops).
Low‑cost implementation using a unified framework, component wrappers and lightweight data collection, suitable for small teams or startups.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
