Why Distributed Tracing Matters: OpenTracing, OpenTelemetry, and Tencent’s TSW
Tracing has evolved from early log and stack‑trace techniques to modern distributed observability standards like OpenTracing and OpenTelemetry, and Tencent’s Cloud Service Watcher (TSW) demonstrates how cloud providers integrate these protocols to simplify microservice monitoring, performance metrics, and root‑cause analysis.
Background and the Need for Distributed Tracing
OpenTracing is the latest open standard for distributed tracing in applications and OSS packages. Developers with large‑scale microservice experience know that while each process can emit logs and metrics, these alone cannot reconstruct the complex journey of a transaction across a distributed system.
Historical Evolution of Tracing
Tracing is not new. Early software, built on client‑server (CS) architectures, relied on logs, debug output, and language‑specific stack traces such as Java’s printStackTrace(). The rise of Service‑Oriented Architecture (SOA) introduced standardized inter‑service communication, but monolithic applications still dominated, limiting tracing benefits.
With the explosion of the Internet and exponential traffic growth, monoliths gave way to microservices, fundamentally changing software architecture. Decoupled services, independent modules, and team‑aligned ownership created a need for new observability tools.
Birth of Distributed Tracing
Distributed tracing emerged as a critical capability for monitoring and fault isolation in modern microservice environments. It became a hot topic, leading to the development of projects like OpenTelemetry.
OpenTelemetry’s Origins
OpenTelemetry’s story begins with Google’s 2010 paper “A Large‑Scale Distributed Systems Tracing Infrastructure,” which inspired later projects such as Dapper and Borg. In December 2016, OpenTracing released version 1.0, providing a vendor‑neutral API for tracing across languages and frameworks and becoming a CNCF incubating project.
In January 2018, Google introduced OpenCensus (joined by Microsoft in June 2018) as an independent, vendor‑agnostic platform for performance collection and tracing. OpenCensus offered both tracing standards and built‑in performance collection, sparking debates about choosing between OpenTracing and OpenCensus.
On May 21, 2019, Google announced the merger of OpenTracing and OpenCensus into a new CNCF project named OpenTelemetry. OpenTelemetry expands beyond tracing to include metrics and logging, aiming to provide a comprehensive, end‑to‑end observability specification.
Current Status of OpenTelemetry
OpenTelemetry is now a CNCF Sandbox project, alongside other sandbox projects such as Chaos Mesh, Cert Manager, and K3s. It fully supports OpenTracing and OpenCensus APIs and contributes many specifications to broader standards bodies like the W3C.
Major tracing ecosystems—Apache SkyWalking, Zipkin, and Jaeger—originated from OpenTracing and are gradually adopting OpenTelemetry. These open‑source components typically provide a client SDK, a data collector, and a data aggregator, forwarding data to storage backends such as Elasticsearch or Cassandra.
Challenges of Self‑Hosted Tracing Solutions
Building a complete tracing stack requires assembling many components (Kafka, Flink, databases, Jaeger, etc.), which can be cumbersome for most customers. Consequently, many organizations prefer a one‑stop cloud service rather than piecing together an entire solution.
Tencent Service Watcher (TSW) – A Cloud‑Native Tracing Solution
TSW (Tencent Service Watcher) is Tencent Cloud’s distributed tracing solution. Its design embraces open source and aims to provide full‑stack tracing capabilities.
TSW currently implements the OpenTracing protocol, ensuring full compatibility with Apache SkyWalking, Zipkin, and Jaeger clients.
Future roadmaps include full OpenTelemetry compatibility, allowing seamless migration without client changes.
The backend draws inspiration from Jaeger and SkyWalking, employing a compute‑storage separation architecture and multi‑layer query mechanisms for high performance.
TSW also offers flexible topology visualizations, call‑chain analysis, and deep aggregation of upstream/downstream service relationships, enabling intuitive success‑rate and latency comparisons across different traces.
Integration with Tencent Cloud Monitor and Log Service provides a unified troubleshooting experience, reducing the need to switch between multiple dashboards.
Visual Overview
Figure 1: Example architecture of a Jaeger‑based tracing deployment.
Figure 2: TSW’s topology view and call‑chain query interface.
Conclusion
Distributed tracing has matured from simple stack traces to a comprehensive observability stack encompassing tracing, metrics, and logging. OpenTelemetry unifies previous efforts, and cloud providers like Tencent are delivering turnkey solutions such as TSW that simplify adoption for microservice architectures.
References
Google Dapper: https://research.google.com/pubs/pub36356.html
Google Borg: https://research.google/pubs/pub43438/
OpenTracing specification: https://opentracing.io/specification/changelog/
Microsoft joins OpenCensus: https://cloudblogs.microsoft.com/opensource/2018/06/13/microsoft-joins-the-opencensus-project/
OpenCensus merged into OpenTelemetry: https://opensource.googleblog.com/2019/05/opentelemetry-merger-of-opencensus-and.html
Jaeger architecture (v1.21): https://www.jaegertracing.io/docs/1.21/architecture/
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
