Operations 24 min read

Visualizing Full‑Link Log Tracing: From Design to Meituan Content Platform

This article presents a visual full‑link log tracing solution that organizes business logs by execution chain, enabling efficient log collection, dynamic linking, and real‑time visualization to pinpoint issues in complex distributed systems, with a detailed case study from Meituan's content platform.

dbaplus Community

Jul 27, 2022

Visualizing Full‑Link Log Tracing: From Design to Meituan Content Platform

Observability is essential for high‑availability systems, but as business logic grows, traditional ELK logging and distributed session tracing become inefficient for business‑level tracing. This article introduces a visual full‑link log tracing approach that uses business chains as the carrier, organizing logs per execution to reconstruct the runtime scene and locate problems quickly.

Background

Business systems are increasingly complex, involving many micro‑services and diverse scenarios. Traditional ELK requires exhaustive log statements and manual stitching, while session tracing focuses on call chains and cannot fully capture business logic, leading to challenges in log collection, filtering, and analysis.

Visual Full‑Link Log Tracing Design

The design addresses two core questions: how to efficiently organize business logs and how to dynamically link them.

Efficient Log Organization : Define a logic node (local method or RPC) and a logic chain that composes nodes according to business rules (sequential, parallel, conditional). A chain execution represents a business‑level trace.

Dynamic Log Linking : Use distributed parameter propagation (traceId‑like identifiers) to pass a unique chain identifier across threads and network calls, coloring logs so they are dynamically attached to the executing node.

Chain Definition

A DSL (JSON or XML) describes nodes and their relationships, supporting serial, parallel, and decision branches. Example DSL snippet defines nodes A, B, C, D, E, F with fork/join and conditional logic.

[
  {"nodeName":"A","nodeType":"rpc"},
  {"nodeName":"Fork","nodeType":"fork","forkNodes":[[{"nodeName":"B","nodeType":"rpc"}],[{"nodeName":"C","nodeType":"local"}]]},
  {"nodeName":"Join","nodeType":"join","joinOnList":["B","C"]},
  {"nodeName":"D","nodeType":"decision","decisionCases":{"true":[{"nodeName":"E","nodeType":"rpc"}]},"defaultCase":[{"nodeName":"F","nodeType":"rpc"}]}
]

Chain Coloring

When a chain starts, a unique identifier (business ID + scenario ID + execution ID) is generated. This identifier is propagated through MQ, RPC, or thread‑local storage, allowing each log entry to be colored with the chain and node IDs, thus dynamically stitching logs into the chain.

Chain Reporting and Storage

During execution, three log types are reported:

Chain logs : basic metadata (type, start/end time).

Node logs : node name, status, timestamps.

Business logs : detailed data such as input/output, intermediate variables, and exceptions.

Logs are collected by a log_agent, sent to Kafka, parsed by Flink, and stored in a tree‑structured model in HBase, enabling later reconstruction of the execution scene.

Meituan Content Platform Practice

The platform processes millions of content items daily, involving dozens of business scenarios and billions of node executions. Traditional tracing could not keep up. The new solution was applied as follows:

Log ingestion pipeline : log_agent → Kafka → Flink → HBase.

TraceLogger toolkit : a wrapper over SLF4J that hides tracing details, providing low‑cost instrumentation for both business and node logs.

Code example shows replacing a standard logger call with TraceLogger.error(...) and using annotations like @TraceNode for AOP‑based node reporting.

Resulting features include:

Real‑time chain query by content ID.

Visual chain diagram showing node execution status.

Node‑level detail view with inputs, outputs, and associated business logs.

These capabilities reduced issue‑resolution time from hours to under five minutes and improved testing efficiency.

Summary and Outlook

Visual full‑link log tracing combines logging and tracing to provide a business‑centric observability solution. It offers low integration cost (DSL + TraceLogger), broad coverage across all content flows, and high operational efficiency through visual query and analysis tools. Future work will extend the observability stack with alerting, dashboards, and deeper diagnostic features for complex distributed systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink dsl Observability Kafka HBase Meituan log tracing business logging

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.