Operations 25 min read

How Visualized Full‑Link Log Tracing Boosts Business Debugging Efficiency

This article introduces a visualized full‑link log tracing solution that organizes and dynamically links business logs by leveraging DSL definitions, distributed parameter propagation, and a tree‑structured storage model, enabling fast, end‑to‑end issue localization in complex microservice systems such as the Dazhong Dianping content platform.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
How Visualized Full‑Link Log Tracing Boosts Business Debugging Efficiency

Background

Observability is essential for high‑availability systems, but as business logic becomes more complex, traditional ELK logging and distributed session tracing become time‑consuming and insufficient for efficient business tracing.

Challenges of Existing Solutions

ELK requires developers to log extensively and then manually filter and correlate logs, which is labor‑intensive. Distributed session tracing (e.g., Dapper, Zipkin) focuses on call chains and cannot accurately represent business logic, especially when multiple parallel calls or conditional branches are involved.

Visualized Full‑Link Log Tracing

Design Idea

The new approach organizes logs around business execution, turning each business run into a visualized execution scene. It answers two key questions: how to efficiently organize business logs and how to dynamically link them.

How to Organize Business Logs

Business logic is abstracted into logic nodes (local methods or RPC calls) and logic links that combine nodes in serial, parallel, or conditional flows. A logic link execution represents a single run of a business scenario.

Dynamic Log Linking

During execution, a unique link identifier (business ID + scenario ID + execution ID) is propagated through threads and network calls, “coloring” logs so they can be dynamically stitched together. When multiple RPC calls share a common identifier (e.g., a task ID), their logs are merged into a single visual link.

General Solution

The solution consists of four steps:

Link Definition – using a DSL (JSON) to describe nodes, their types, and execution rules.

Link Coloring – passing the unique identifier to tag logs at each node.

Link Reporting – reporting node logs and business logs to a central system.

Link Storage – persisting logs in a tree‑structured model (link log, node log, business log) in a storage backend such as HBase.

<code>[
  {
    "nodeName": "A",
    "nodeType": "rpc"
  },
  {
    "nodeName": "Fork",
    "nodeType": "fork",
    "forkNodes": [
      [
        {"nodeName": "B", "nodeType": "rpc"}
      ],
      [
        {"nodeName": "C", "nodeType": "local"}
      ]
    ]
  },
  {
    "nodeName": "Join",
    "nodeType": "join",
    "joinOnList": ["B", "C"]
  },
  {
    "nodeName": "D",
    "nodeType": "decision",
    "decisionCases": {
      "true": [{"nodeName": "E", "nodeType": "rpc"}],
      "defaultCase": [{"nodeName": "F", "nodeType": "rpc"}]
    }
  }
]
</code>

Link Coloring Details

Link identifier = business ID + scenario ID + execution ID. Node identifier = link identifier + node name. The identifier is passed through thread‑local storage and network headers, allowing each log entry to be associated with its node.

Link Reporting

Two log types are reported:

Node logs : start/end timestamps, status, input/output.

Business logs : log level, timestamp, and data relevant to business logic.

Link Storage

Logs are stored in a tree structure where the business ID is the root, enabling efficient queries for a specific content’s execution trace.

Dazhong Dianping Content Platform Practice

Business Characteristics and Challenges

The platform handles millions of content items daily, with diverse production, governance, and consumption flows, leading to massive, heterogeneous log volumes.

Implementation Highlights

Log collection via log_agent → Kafka → Flink for parsing.

Unified storage in HBase using the tree model.

Custom TraceLogger library (API‑compatible with SLF4J) that abstracts log reporting and reduces integration effort.

<code>// Before replacement: original log reporting
LOGGER.error("update struct failed, param:{}", GsonUtils.toJson(structRequest), e);
// After replacement: full‑link log reporting
TraceLogger.error("update struct failed, param:{}", GsonUtils.toJson(structRequest), e);
</code>
<code>public Response realTimeInputLink(long contentId) {
    // Start link: pass identifier
    TraceUtils.passLinkMark("contentId_type_uuid");
    // Local call (API node log)
    TraceUtils.reportNode("contentStore", contentId, StatusEnums.RUNNING);
    contentStore(contentId);
    TraceUtils.reportNode("contentStore", structResp, StatusEnums.COMPLETED);
    // Remote call
    Response processResp = picProcess(contentId);
}

@TraceNode(nodeName="picProcess")
public Response picProcess(long contentId) {
    // Business log
    TraceLogger.warn("picProcess failed, contentId:{}", contentId);
}
</code>

Results

The visualized tracing system reduced issue‑locating time from hours to under five minutes, provided one‑click trace of any content across all scenarios, and served as both a debugging and testing aid.

Conclusion and Outlook

Observability, especially visualized full‑link log tracing, is crucial for complex distributed systems. Future work will extend the platform to cover alerts, overviews, fault isolation, and deeper analysis, offering a comprehensive observability suite for large‑scale microservice environments.

ELK case
ELK case
Distributed tracing case
Distributed tracing case
Business log tracing case
Business log tracing case
DSL definition
DSL definition
General solution breakdown
General solution breakdown
Link storage model
Link storage model
Log reporting architecture
Log reporting architecture
TraceLogger tool
TraceLogger tool
Link query
Link query
Link visualization
Link visualization
Node details
Node details
Distributed Systemsbig datamicroservicesobservabilitylog tracing
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.