Design and Implementation of Meituan Hotel Full-Chain Log and Trace System
To cope with Meituan Hotel’s exploding micro‑service complexity, the infrastructure team built the Satellite System—combining MTrace and a selective, zero‑intrusion Log4j2‑based logging pipeline that streams enriched logs through Kafka, Storm, Redis and Elasticsearch, delivering second‑level trace‑log queries and six‑month retention, dramatically speeding up debugging.
With the rapid growth of Meituan's hotel business, service complexity and the number of micro‑service nodes have increased dramatically, making problem localization difficult. Traditional manual log searching across dozens of nodes is time‑consuming.
To address this, the infrastructure team introduced MTrace and built a full‑chain log system (named the "Satellite System") that integrates trace and log data, enabling near‑real‑time (second‑level) querying and half‑year retention in an external storage.
Goals : (1) Quickly locate the Trace ID from user behavior and retrieve logs from all involved nodes; (2) Provide near‑real‑time query performance with logs stored for at least six months.
The solution includes selective logging (only high‑value logs), full‑node trace propagation, zero‑intrusion log interception via a Log4j2 global filter, automatic log enrichment (AppKey, hostname, IP, timestamp, etc.), and transmission to Kafka through the logcenter ScribeAppender.
Architecture: The system consists of a trace layer (MTrace with Hystrix‑Trace plugin) and a log layer (interception → formatting → Kafka → Storm processing → storage). Storage uses Squirrel (Redis cluster) for fast, short‑term access and Elasticsearch for persistent, searchable logs.
Log Sampling : Only logs generated by internal staff (e.g., hotel division employees) are collected, drastically reducing volume while preserving useful data.
Log Interception : By registering a global Log4j2 filter, all application logs are captured without modifying business code.
Log Formatting : A unified schema is enforced via Log4j2 plugins, automatically filling missing fields.
Processing : Storm streams logs for real‑time analysis; critical logs are also cached in Squirrel for sub‑minute latency, while Elasticsearch provides long‑term storage.
Trace Integrity : ThreadLocal‑based trace propagation is reinforced using transmittable‑thread‑local wrappers and a custom Hystrix‑Trace plugin to handle asynchronous execution.
Effectively, the system aggregates full‑chain trace data and corresponding logs, dramatically improving debugging efficiency, as demonstrated by a sample POI detail page trace.
Future plans include multi‑trace correlation search and automated business correctness checks based on log analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
