How Tencent’s Multi‑Dimensional Monitoring Turns Big Data Into Real‑Time Business Insights
This article explains how Tencent’s ZhiYun multi‑dimensional monitoring system evolves from the Mobile Monitor platform, outlines its design principles, data‑factory capabilities, storage choices, and intelligent features, and demonstrates how it enables real‑time, multi‑dimensional analysis and alerting for large‑scale business operations.
Background
In recent years, big data technologies have matured, enabling reliable collection, processing, and storage. Common big‑data applications include recommendation, BI reporting, profiling, log search, and machine learning. Real‑time monitoring is a critical scenario for fast statistics and anomaly alerts.
From Mobile Monitor to Multi‑Dimensional Monitoring
The original Mobile Monitor (MM) system collected dimensions such as region, carrier, version, command, SET, APN, and metrics like request count, success rate, latency, using Storm for real‑time analysis and alerting. Limitations of MM—single data source, fixed processing logic, outdated stack, poor scalability—prompted its reconstruction as the Hubble platform, now called ZhiYun Multi‑Dimensional Monitoring.
Design Principles
Backward compatibility with existing functions.
Component‑based, reusable real‑time processing.
Low‑code configuration: users can build Storm topologies via UI.
Optimized architecture for accuracy and low latency.
Improved user experience with unified UI and helpful error messages.
Data Factory
Common big‑data operations are classified and exposed as UI configuration items:
Filtering : ensure data completeness, remove redundant records.
Formatting : time conversion, type conversion, URL encode/decode.
Translation : IP lookup, dictionary mapping, delimiter split, arithmetic, UDF.
Forwarding : send to SNG DC, CDB, Kafka.
Grouping : define window, time field, group fields.
Aggregation (with optional filtering) : count, distinct count, min, max, first, last, sum, average.
These functions generate a Storm topology that performs initial aggregation (e.g., a 1‑minute sliding window) and stores results in an OLAP engine.
Storage Engine
ZhiYun primarily uses Druid, a time‑series database optimized for multi‑dimensional analysis, while smaller datasets may be stored in PostgreSQL/MySQL and full‑text search data in Elasticsearch.
Application Ecosystem
Processed data flows from the Data Factory into the monitoring system and downstream applications, enabling multi‑dimensional drill‑down analysis and alerting.
Multi‑Dimensional Drill‑Down Analysis
The analysis UI consists of a business tree, dimension filters, metric trend chart, and data panel. Users can drill down by selecting abnormal time points, sorting by request count, and isolating problematic dimension combinations (e.g., specific AppID, command, return code) to pinpoint faults.
Multi‑Dimensional Alerting
Beyond visual analysis, the system supports configurable alert rules that trigger notifications when complex multi‑dimensional conditions are met. Users can also set subscription, convergence, and suppression rules to streamline incident response.
Intelligent Features
Machine‑learning models provide root‑cause recommendation and threshold‑free anomaly detection by learning from historical data.
Current Status
More than 200 internal Tencent services and over a thousand servers are already using ZhiYun Multi‑Dimensional Monitoring.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.