How Alibaba Achieves Full‑Link Business Monitoring: A Practical Guide
Alibaba’s infrastructure team introduces a full‑link business monitoring approach that visualizes end‑to‑end health from a business perspective, unifies metrics, automates data collection, and leverages intelligent baseline alerts, enabling rapid issue detection, precise root‑cause analysis, and fine‑grained dimension monitoring across services.
Background
Rapid growth of new businesses and technologies at Alibaba has exposed the limitations of traditional monitoring dashboards, which lack a global business view, standardized metrics, business‑oriented perspectives, and incur high configuration costs.
Full‑View Monitoring
Business‑centric full‑link monitoring visualizes the health of the entire business process without switching systems, providing a clear global and upstream‑downstream view for fast problem discovery and localization.
Business Monitoring Model
Business Domain : a complete business or product, e.g., the “transaction domain”, “marketing domain”, “payment domain”.
Business Activity : core use cases within a domain, such as “order confirmation” or “order creation”. Each activity has standard “golden metrics” and forms the business link when connected to other activities.
System Service : key methods that support a business activity, e.g., member query, product query, discount query, each also represented by golden metrics.
Monitoring Process
Identify key business activities and their dependent system services.
Configure non‑intrusive monitoring SDK to instrument data points automatically.
System generates business links, calculating traffic, latency, and success‑rate metrics for each node.
Intelligent anomaly detection combines “baseline alerts” and “expert rule alerts” to highlight abnormal nodes without manual rule configuration.
Golden Metrics
Traffic : call volume per unit time (e.g., QPS, orders per second).
Latency : processing time, distinguished between success and failure.
Error : error count, success rate, error codes.
Saturation : resource usage ratio (mainly reflects the application layer).
In business monitoring, traffic, latency, and error metrics are sufficient to answer whether a business is healthy; saturation is more relevant to application‑level monitoring.
Business Dimensions
Extensible dimensions such as business identity, merchant, and store enable fine‑grained monitoring. For example, the transaction domain can be filtered by “Hema” to view only Hema‑related calls.
Configurable Instrumentation
The monitoring SDK uses AOP to provide configuration‑based instrumentation; a simple configuration file enables automatic data interception, calculation, and reporting, fully decoupled from business code.
Automatic Link Generation
The platform automatically generates core business links, golden metrics, and dimension dashboards without user configuration; users can adjust links via a visual editor.
Intelligent Baseline Alerts
Machine‑learning predicts reasonable metric ranges; exceeding these bounds triggers automatic alerts, eliminating manual threshold configuration. Over 1,200 metrics have been integrated with high precision and recall, now applied to business‑wide full‑link monitoring for fully automated anomaly detection.
Practical Cases
Global Transaction Link
The global transaction link lists key business activities without detailed system services, suitable for full‑link stress testing and large‑scale promotional events; it was used in the 6.18 promotion.
Core Transaction Link
This automatically generated core link highlights business activities (green nodes) and their dependent system services (yellow nodes), allowing quick insight into transaction health and downstream dependencies.
POS Service Link
The POS link monitors the offline payment scenario for new‑retail businesses, adding merchant and store dimensions to provide real‑time, fine‑grained monitoring for each store (e.g., Hema, Da Run Fa).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
