Large-Scale Service Governance Design and Practice Using MTrace

MTrace, Meituan‑Dianping’s internal distributed tracing platform, assigns a global 64‑bit traceId to each request, instruments RPC, HTTP, database and messaging calls, aggregates data via Kafka into HBase and Hive, and visualizes the full call chain to pinpoint network cross‑datacenter traffic, latency bottlenecks, redundant calls, and correlated exceptions, thereby enabling systematic service‑level optimization.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Large-Scale Service Governance Design and Practice Using MTrace

Introduction

MTrace is Meituan-Dianping's internal distributed session tracing system. Its core idea is the call chain: a global ID links all service nodes involved in a single request, enabling reconstruction of call relationships, problem tracing, metric analysis, etc. The design draws from Google’s Dapper paper, Twitter’s Zipkin and Alibaba’s Eagle Eye.

Network Optimization

Visualization of request flow across service nodes and IPs helps identify cross‑datacenter calls and optimize network topology.

Bottleneck Identification

By highlighting latency‑heavy nodes in the call chain, MTrace allows rapid pinpointing of downstream service bottlenecks, reducing time spent on inter‑team coordination.

Link Optimization

Repeated calls to the same interface can be aggregated (e.g., batch calls) or parallelized to improve efficiency.

Exception Log Binding

MTrace binds exception logs to the corresponding trace ID, allowing correlation of errors across upstream and downstream services.

Transparent Data Transmission

Two APIs enable custom data propagation through a request:

put(map<String, String> data)
putOnce(map<String, String> data)

put propagates data for the entire request lifecycle; putOnce limits propagation to a single hop.

System Architecture

The system consists of three layers: data instrumentation reporting, data collection & computation, and front‑end visualization.

Basic Concepts

traceId : a globally unique 64‑bit identifier for a request.

spanId : hierarchical identifier (e.g., 0, 0.1, 0.1.1) indicating a node’s position in the call graph.

annotation : user‑defined data attached to a trace, such as user ID.

Instrumentation

An SDK injects trace context into various middleware (RPC, HTTP, MySQL, Tair, MQ). Context is stored in ThreadLocal for synchronous calls and passed explicitly for asynchronous execution.

Four instrumentation stages:

Client Send: Span span = Tracer.clientSend(param); Server Receive: Tracer.serverRecv(param); Server Send: Tracer.serverSend(); Client Receive:

Tracer.clientRecv();

Data Storage

Data is first queued to Kafka for decoupling, then persisted to HBase for real‑time queries (traceId as row key) and to Hive for offline analysis (service degree metrics).

Front‑End Presentation

Because timestamps from different machines may drift, the UI orders spans primarily by spanId, using timestamps as a secondary key.

Conclusion

Key concepts: call chain, bottleneck localization, metric collection, and three‑tier architecture (instrumentation, aggregation, visualization). MTrace enables comprehensive service relationship analysis, facilitating system optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendMicroservicesDistributed Tracingservice governanceMTrace
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.