Business Monitoring: Importance, Metric System Design, and Practical Implementation
This article explains the significance of business monitoring, distinguishes technical and business metrics, outlines a step‑by‑step process for building a business metric system, and shares practical experiences, tools, and common pitfalls to help teams improve operational reliability and decision‑making.
1. Significance of Business Monitoring
Metrics are defined numerical values used to quantify and abstract facts. Technical staff must consider both technical and business metrics.
Technical Metrics
Technical metrics such as service availability, performance TP99, and call volume help developers understand system health and detect potential issues early. However, they cannot guarantee the absence of business anomalies caused by process errors, user‑driven changes, external dependencies, or configuration mistakes.
Business Metrics
Business metrics focus on data correctness and completeness, playing a key role in system stability management and data‑driven decision making.
1) Early detection of online problems – By monitoring business metrics, teams can uncover technical or data issues, shorten mean time to repair (MTTR), and resolve problems faster.
2) Understanding business operation patterns – Monitoring indicators such as order volume or delivery time helps plan capacity and adjust strategies.
3) Driving business operations – Proactive monitoring can trigger actions, e.g., optimizing delivery routes when regional latency exceeds expectations.
Relationship Between Technical and Business Metrics
Technical and business data correctness are often interrelated. If a technical metric is unavailable, the corresponding business metric will also be unavailable, but the reverse is not always true.
One technical metric may map to one or many business metrics, and vice‑versa.
2. Basic Process for Building a Business Metric System
1) Determine the Value of Business Metrics
R&D must understand the business logic to create meaningful metrics. Consider four questions:
Is the metric valuable? Does it reflect the core value of the service?
Is it measurable? Can it detect data accuracy or configuration issues?
Is it actionable? Can the team act when the metric degrades?
Is it understandable? Can the whole team grasp its meaning?
The metric system should be dynamic, evolving with business needs while avoiding unnecessary complexity.
2) Business Metric Design
2.1) Metric Classification
Metrics can be classified as:
Basic metrics – atomic, indivisible business attributes.
Composite metrics – derived from basic metrics through calculations.
Derived metrics – combine basic/composite metrics with dimensions or statistical attributes (e.g., cumulative values, year‑over‑year).
2.2) What Makes a Good Metric?
Clarity – clear definition and calculation method.
Actionability – drives concrete actions or decisions.
Comparability – enables comparison across time periods or groups.
Simplicity – expressed as a simple number.
Monitorability – exhibits clear patterns for alerting.
3) Methods and Tools for Business Metric Monitoring
Common methods include:
Year‑over‑year / month‑over‑month analysis.
Standard deviation based alerts.
Intelligent threshold alerts based on historical experience.
4) Follow‑up After Business Metric Alerts
When an alert fires, analyze the cause: normal logic, code bug, upstream parameter issue, or configuration problem.
3. User (Business) Perspective Concerns
From the user side, monitoring calendar validity and wave availability is essential for external delivery promises.
From the logistics side, monitoring order production pace prevents over‑stocking or idle workers.
4. Practical Implementation – Iterative Improvement
1) Small Steps, Fast Execution
Start with coverage of P3/P4 incidents, then iterate and refine.
1.1) From Nothing to Something
Initial business metric for data sync success rate was noisy and lacked monitoring value.
1.2) From Something to Accurate
After filtering out irrelevant noise, the metric stabilized and became useful.
2) Model Refinement
2.1) Pre‑order – Settlement Calendar
Early calendar monitoring used the pfinder tool.
2.2) Post‑order – Order Transmission Rate
Scenario: business configuration data inaccuracies.
Sample business monitoring log:
|xx服务>tid=xxxx>orderId=xxxxxxxx|transferService|-1|Tue Jan 07 00:00:00 CST 2025Monitoring configuration example:
2.3) After‑sale – Reverse Pickup Calendar
Monitor calendar availability, length, and wave options to quickly detect risks.
3) Direct Line Business – JDME
When order volume exceeds 120% of preset capacity, an alert is sent to operations for adjustment.
5. Common Pitfalls
1. Too Many Metrics
Monitoring aims to find problems, not to collect excessive metrics.
2. Unclear Metric Definitions
Metrics must have clear definitions and consistent team understanding; otherwise, add explanations.
3. Prefer Fewer Over Redundant Metrics
When possible, use a single metric to explain an issue.
4. Metrics Must Enable Rapid Problem Localization
Design alerts so that logs contain order IDs, trace IDs, addresses, error codes, etc., to quickly pinpoint the root cause.
5. Monitoring Code Should Not Impact Technical Availability
Business‑monitoring code must catch exceptions and never affect the primary service logic.
6. Future Plans
1) Refine existing business metrics for faster, more accurate alerts.
2) Build a comprehensive metric system for external order chains.
Scan the QR code to join the technical community.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.