Visualizing Full‑Link Log Tracing: From Design to Meituan Content Platform
This article presents a visual full‑link log tracing solution that organizes business logs by execution chain, enabling efficient log collection, dynamic linking, and real‑time visualization to pinpoint issues in complex distributed systems, with a detailed case study from Meituan's content platform.
Observability is essential for high‑availability systems, but as business logic grows, traditional ELK logging and distributed session tracing become inefficient for business‑level tracing. This article introduces a visual full‑link log tracing approach that uses business chains as the carrier, organizing logs per execution to reconstruct the runtime scene and locate problems quickly.
Background
Business systems are increasingly complex, involving many micro‑services and diverse scenarios. Traditional ELK requires exhaustive log statements and manual stitching, while session tracing focuses on call chains and cannot fully capture business logic, leading to challenges in log collection, filtering, and analysis.
Visual Full‑Link Log Tracing Design
The design addresses two core questions: how to efficiently organize business logs and how to dynamically link them.
Efficient Log Organization : Define a logic node (local method or RPC) and a logic chain that composes nodes according to business rules (sequential, parallel, conditional). A chain execution represents a business‑level trace.
Dynamic Log Linking : Use distributed parameter propagation (traceId‑like identifiers) to pass a unique chain identifier across threads and network calls, coloring logs so they are dynamically attached to the executing node.
Chain Definition
A DSL (JSON or XML) describes nodes and their relationships, supporting serial, parallel, and decision branches. Example DSL snippet defines nodes A, B, C, D, E, F with fork/join and conditional logic.
[
{"nodeName":"A","nodeType":"rpc"},
{"nodeName":"Fork","nodeType":"fork","forkNodes":[[{"nodeName":"B","nodeType":"rpc"}],[{"nodeName":"C","nodeType":"local"}]]},
{"nodeName":"Join","nodeType":"join","joinOnList":["B","C"]},
{"nodeName":"D","nodeType":"decision","decisionCases":{"true":[{"nodeName":"E","nodeType":"rpc"}]},"defaultCase":[{"nodeName":"F","nodeType":"rpc"}]}
]Chain Coloring
When a chain starts, a unique identifier (business ID + scenario ID + execution ID) is generated. This identifier is propagated through MQ, RPC, or thread‑local storage, allowing each log entry to be colored with the chain and node IDs, thus dynamically stitching logs into the chain.
Chain Reporting and Storage
During execution, three log types are reported:
Chain logs : basic metadata (type, start/end time).
Node logs : node name, status, timestamps.
Business logs : detailed data such as input/output, intermediate variables, and exceptions.
Logs are collected by a log_agent, sent to Kafka, parsed by Flink, and stored in a tree‑structured model in HBase, enabling later reconstruction of the execution scene.
Meituan Content Platform Practice
The platform processes millions of content items daily, involving dozens of business scenarios and billions of node executions. Traditional tracing could not keep up. The new solution was applied as follows:
Log ingestion pipeline : log_agent → Kafka → Flink → HBase.
TraceLogger toolkit : a wrapper over SLF4J that hides tracing details, providing low‑cost instrumentation for both business and node logs.
Code example shows replacing a standard logger call with TraceLogger.error(...) and using annotations like @TraceNode for AOP‑based node reporting.
Resulting features include:
Real‑time chain query by content ID.
Visual chain diagram showing node execution status.
Node‑level detail view with inputs, outputs, and associated business logs.
These capabilities reduced issue‑resolution time from hours to under five minutes and improved testing efficiency.
Summary and Outlook
Visual full‑link log tracing combines logging and tracing to provide a business‑centric observability solution. It offers low integration cost (DSL + TraceLogger), broad coverage across all content flows, and high operational efficiency through visual query and analysis tools. Future work will extend the observability stack with alerting, dashboards, and deeper diagnostic features for complex distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
