Monitoring Practices for Low‑Frequency Financial Services: Lessons from E‑commerce and Reliable Alerting Techniques
This article shares practical monitoring strategies for financial services with low‑frequency operations, contrasting e‑commerce monitoring methods, outlining the challenges of financial monitoring, and presenting reliable solutions such as success‑rate alerts, aspect‑oriented exception handling with whitelists, and circuit‑breaker degradation using Sentinel.
Introduction
The author, after transitioning from e‑commerce to finance, describes the monitoring pain points encountered in the financial domain and explains why the e‑commerce monitoring approaches cannot be directly applied.
Common Monitoring Methods in E‑commerce
E‑commerce monitoring typically relies on two dimensions: traffic monitoring (API requests) and key‑node monitoring (e.g., registration, order placement). The usual technique is to instrument "points" for each request or key event and compare today's point count with yesterday's. A significant drop (e.g., >50%) triggers an alarm.
These methods work well in e‑commerce because of high traffic volume and a limited set of critical nodes, which makes percentage‑based alerts reliable and the investigation path short.
Challenges of Monitoring in Financial Scenarios
Financial services are low‑frequency; daily active users are orders of magnitude lower than in e‑commerce, so key‑node point counts can be zero for long periods, making percentage‑based alerts ineffective. The author outlines the cash‑loan business flow (pre‑loan, loan, post‑loan) and highlights the many critical nodes involved, each with sparse data.
Reliable Monitoring Techniques for Financial Scenarios
1. Monitoring Success Counts and Success Rates per Funding Source
Instead of monitoring every key node, the team tracks the success count or success rate of each funding source for each stage (pre‑loan, loan, post‑loan). Hourly success totals are compared against the average of the same time slot over the past week. If the current total falls below half of the historical average, an alarm is raised. The comparison window expands (2‑hour, 3‑hour, …) until the accumulated count reaches a configurable threshold (e.g., 20) to reduce false positives.
For funding sources with very low volume, the window is extended to eight hours; a zero count in that window also triggers an alert.
2. Aspect‑Oriented Exception Handling with a Whitelist
Each funding source has its own API contracts, resulting in hundreds of interface calls. To avoid scattering alert code throughout the codebase, the team uses an AOP aspect to capture all exceptions thrown by funding‑source requests and send a DingTalk alert.
Normal business failures (e.g., "account permanently frozen") are filtered out via a dynamic whitelist stored in QConf (a Zookeeper‑based configuration service). New normal failures are added to the whitelist as they are discovered.
Example of the low‑level request code:
// 接口调用
String result = httpPost();
Response response = JSON.parseObject(result, Response.class);
// 如果请求的状态码是失败的状态码
if (!response.getCode().equals(SUCCESS_CODE)) {
// 抛出异常,异常里带有为资金方返回的失败信息
throw new Exception(ErrorCodeEnum.ERROR_REQUEST_EXCEPTION, response.getMessage());
}Aspect implementation:
@Aspect
public class LoggingAspect {
//只捕获所有资金方请求文件所在的包
@AfterThrowing("execution(* com.howtodoinjava.app.service.impl.*(..))", throwing = "ex")
public void logAfterThrowingAllMethods(CustomException ex) throws Throwable {
//发送抛出的错误信息至钉钉告警
sendDingWarning(ex.getMessage());
}
}3. Circuit‑Breaker Degradation for Funding Sources
Because a single funding‑source outage can cascade and make the whole business unavailable, the team introduces a circuit‑breaker using Alibaba Sentinel. When the error count for a source exceeds a threshold within one minute, Sentinel degrades the service, causing immediate failures and allowing the system to stay responsive.
Conclusion
The article summarizes that by focusing on success‑rate monitoring, aspect‑oriented alerting with a whitelist, and Sentinel‑based circuit breaking, the team achieved near‑100% detection of issues and maintained platform stability in a low‑frequency financial environment. Future work may explore machine‑learning‑based behavior prediction to further improve alert timeliness.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
