Operations 11 min read

How Alibaba Achieves Full‑Link Business Monitoring: A Practical Guide

Alibaba’s infrastructure team introduces a full‑link business monitoring approach that visualizes end‑to‑end health from a business perspective, unifies metrics, automates data collection, and leverages intelligent baseline alerts, enabling rapid issue detection, precise root‑cause analysis, and fine‑grained dimension monitoring across services.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Achieves Full‑Link Business Monitoring: A Practical Guide

Background

Rapid growth of new businesses and technologies at Alibaba has exposed the limitations of traditional monitoring dashboards, which lack a global business view, standardized metrics, business‑oriented perspectives, and incur high configuration costs.

Full‑View Monitoring

Business‑centric full‑link monitoring visualizes the health of the entire business process without switching systems, providing a clear global and upstream‑downstream view for fast problem discovery and localization.

Business Monitoring Model

Business Domain : a complete business or product, e.g., the “transaction domain”, “marketing domain”, “payment domain”.

Business Activity : core use cases within a domain, such as “order confirmation” or “order creation”. Each activity has standard “golden metrics” and forms the business link when connected to other activities.

System Service : key methods that support a business activity, e.g., member query, product query, discount query, each also represented by golden metrics.

Monitoring Process

Identify key business activities and their dependent system services.

Configure non‑intrusive monitoring SDK to instrument data points automatically.

System generates business links, calculating traffic, latency, and success‑rate metrics for each node.

Intelligent anomaly detection combines “baseline alerts” and “expert rule alerts” to highlight abnormal nodes without manual rule configuration.

Golden Metrics

Traffic : call volume per unit time (e.g., QPS, orders per second).

Latency : processing time, distinguished between success and failure.

Error : error count, success rate, error codes.

Saturation : resource usage ratio (mainly reflects the application layer).

In business monitoring, traffic, latency, and error metrics are sufficient to answer whether a business is healthy; saturation is more relevant to application‑level monitoring.

Business Dimensions

Extensible dimensions such as business identity, merchant, and store enable fine‑grained monitoring. For example, the transaction domain can be filtered by “Hema” to view only Hema‑related calls.

Configurable Instrumentation

The monitoring SDK uses AOP to provide configuration‑based instrumentation; a simple configuration file enables automatic data interception, calculation, and reporting, fully decoupled from business code.

Automatic Link Generation

The platform automatically generates core business links, golden metrics, and dimension dashboards without user configuration; users can adjust links via a visual editor.

Intelligent Baseline Alerts

Machine‑learning predicts reasonable metric ranges; exceeding these bounds triggers automatic alerts, eliminating manual threshold configuration. Over 1,200 metrics have been integrated with high precision and recall, now applied to business‑wide full‑link monitoring for fully automated anomaly detection.

Practical Cases

Global Transaction Link

The global transaction link lists key business activities without detailed system services, suitable for full‑link stress testing and large‑scale promotional events; it was used in the 6.18 promotion.

Core Transaction Link

This automatically generated core link highlights business activities (green nodes) and their dependent system services (yellow nodes), allowing quick insight into transaction health and downstream dependencies.

POS Service Link

The POS link monitors the offline payment scenario for new‑retail businesses, adding merchant and store dimensions to provide real‑time, fine‑grained monitoring for each store (e.g., Hema, Da Run Fa).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabamonitoringOperationsbusiness metricsfull‑link
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.