Visualized Full‑Chain Log Tracing for Complex Business Systems

The article analyzes the shortcomings of traditional ELK and distributed tracing for complex business systems, proposes a visualized full‑chain log tracing solution that organizes and dynamically links logs by business chain, and demonstrates its implementation and performance gains at Meituan’s content platform.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Visualized Full‑Chain Log Tracing for Complex Business Systems

Background

Observability is essential for high‑availability systems. As business logic becomes more complex, traditional ELK stacks require developers to print exhaustive logs, manually filter them from Elasticsearch, and spend considerable time reconstructing execution scenes. Distributed session‑tracing generates a trace per request using a global traceId, which cannot simultaneously track multiple related calls, cannot fully describe business logic, and includes many downstream services that obscure the current system’s execution.

Visualized Full‑Chain Log Tracing

Design Idea

The scheme treats the business chain as the carrier of logs. During execution, logs are organized according to business logic, forming a visual reconstruction of the execution scene.

General Solution

How to efficiently organize business logs? Abstract business logic into logic nodes and logic chains . A logic node represents an independent unit (local method or RPC); a logic chain composes nodes to describe a complete business scenario.

How to dynamically link logs? Propagate a unique identifier (business ID + scenario ID + execution ID) through threads and network calls, enabling log coloring and dynamic stitching of logs to the currently executing node.

Implementation Details

Link Definition

A domain‑specific language (DSL) expressed in JSON describes the static structure of a logic chain, including node types (rpc, local, fork, join, decision) and business rules (serial, parallel, conditional).

{ "nodeName":"A","nodeType":"rpc" },{ "nodeName":"Fork","nodeType":"fork","forkNodes":[[{"nodeName":"B","nodeType":"rpc"}],[{"nodeName":"C","nodeType":"local"}]] },{ "nodeName":"Join","nodeType":"join","joinOnList":["B","C"] },{ "nodeName":"D","nodeType":"decision","decisionCases":{"true":[{"nodeName":"E","nodeType":"rpc"}]},"defaultCase":[{"nodeName":"F","nodeType":"rpc"}] }

Link Coloring

When a logic chain starts, a unique link identifier is generated. Each node inherits this identifier; as the chain progresses the identifier is passed along, allowing logs to be “colored” and attached to the correct node.

Link Reporting

Two kinds of logs are reported:

Node logs : record node start/end times, status, input, and output.

Business logs : record domain‑specific data such as request parameters, intermediate variables, and exceptions.

Link Storage

Reported logs are stored in a tree‑structured model (link log, node log, business log, metadata) in a persistent store. The implementation chooses HBase for its high‑throughput, low‑latency characteristics.

Case Study: Meituan Dianping Content Platform

Business Characteristics and Challenges

The platform processes millions of content items daily, supports dozens of business scenarios, and executes billions of logic nodes. Logs are scattered across services, making collection and scene reconstruction extremely difficult.

Practices and Results

Large‑Scale Log Ingestion

Architecture: log_agent → Kafka → Flink → HBase . This pipeline supports massive log volumes and distributed services.

Low‑Cost Service Refactoring

A custom TraceLogger library, compatible with SLF4J, abstracts log‑reporting details. It provides APIs for business‑log and node‑log reporting, automatic identifier passing, and error handling, minimizing code changes.

// Before replacement
LOGGER.error("update struct failed, param:{}", GsonUtils.toJson(structRequest), e);
// After replacement
TraceLogger.error("update struct failed, param:{}", GsonUtils.toJson(structRequest), e);
public Response realTimeInputLink(long contentId) {
    // Start link: pass identifier
    TraceUtils.passLinkMark("contentId_type_uuid");
    // Local call (API node log)
    TraceUtils.reportNode("contentStore", contentId, StatusEnums.RUNNING);
    contentStore(contentId);
    // Remote call
    Response resp = picProcess(contentId);
    // ...
}

@TraceNode(nodeName="picProcess")
public Response picProcess(long contentId) {
    TraceLogger.warn("picProcess failed, contentId:{}", contentId);
}

Outcomes

One‑click tracing of all logic chains for any content item.

Visual link graphs showing full business‑logic panoramas and node execution details.

Node‑detail view exposing inputs, outputs, and associated business logs.

Problem‑resolution time reduced from hours to under five minutes.

Improved efficiency for developer self‑testing and QA.

Summary and Outlook

The visualized full‑chain log tracing solution combines logging and tracing to dynamically organize logs during business execution, replacing manual, delayed log stitching. It has been deployed at the Dianping content platform, delivering low entry cost, wide coverage, and high operational efficiency. Future work will extend the observability stack with alerting, overview dashboards, and deeper analysis for complex distributed systems.

References

[1] Metrics, tracing, and logging – https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html

[2] ELK Stack: Elasticsearch, Logstash, Kibana – https://aws.amazon.com/cn/opensearch-service/the-elk-stack/

[3] Dapper, a Large‑Scale Distributed Systems Tracing Infrastructure – https://static.googleusercontent.com/media/research.google.com/zh-CN//archive/papers/dapper-2010-1.pdf

[4] OpenZipkin – https://zipkin.io/

[5] Distributed Session Tracing System Architecture Design and Practice – https://tech.meituan.com/2016/10/14/mt-mtrace.html

[6] Phoenix Architecture – Observability – http://icyfenix.cn/distribution/observability/

[7] Cloud‑Native Observability – https://zhuanlan.zhihu.com/p/137672436

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsDSLObservabilityMeituanlog tracing
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.