Design and Implementation of an Automated Payment Channel Management System
This article describes the design, technology choices, architecture, and implementation details of an automated payment channel management system that uses Redis‑based time‑series storage, custom circuit‑breaker logic, and monitoring to achieve fast fault detection, accurate alerting, and future automated failover.
Background
To meet growing business demand, ZhaiZhai integrated many payment channels whose third‑party stability varied, causing frequent channel failures. Detecting anomalies was delayed, relying on alerts or user feedback. As the core system handling all company payments, a fully automated payment channel management system was needed.
Design Goals
Multi‑channel, multi‑entity monitoring capability.
Rapid fault detection and root‑cause localization.
Minimize false positives and false negatives.
Automated channel failover capability.
Technical Selection
For circuit‑breaker functionality, existing solutions like Hystrix were insufficient because they operate at the interface level and cannot handle channel‑ or merchant‑level degradation, nor allow custom traffic probing during failover.
For time‑series storage, after evaluating popular options, the team chose between Prometheus and a custom Redis‑based solution. Prometheus sacrifices some accuracy for reliability and simplicity, which is unsuitable for high‑sensitivity channel switching. Redis, familiar to Java backend developers, offered lower learning and maintenance costs.
Architecture Design
Payment requests flow through channel routing to select an available channel, then invoke the gateway for order placement or payout. Results are reported via MQ to the monitoring system, which stores data in Redis. A computation module aggregates failure rates and triggers alerts based on configured rules. Data is periodically backed up to MySQL, and visualized in Grafana via Prometheus.
Implementation Details
Data Structure
The Redis storage mimics time‑series concepts using sets, sorted sets, and hashes, as shown in the comparison with InfluxDB.
1. set
存储已统计的维度,具体到商户号
key: routeAlarm:alarmitems
value: 微信-打款-100000111
微信-打款-100000112
微信-打款-100000113
.......
2. zset
存储指定商户号请求的时间戳(秒),同一秒的数据会覆盖存储
key: routeAlarm:alarmitem:timeStore:微信-打款-100000111
score: 1657164225 value: 1657164225
score: 1657164226 value: 1657164226
score: 1657164227 value: 1657164227
.......
3. hash
存储指定商户号1秒内的请求结果, 每秒汇总一份结果
key: routeAlarm:alarmitem:fieldStore:微信-打款-100000111:1657164225
key: success value: 10 (次数)
key: fail value: 5
key: balance_not_enough value: 3
key: thrid_error value: 2
.......Core Algorithm
The algorithm combines a local counting method with a sliding window to avoid missing short spikes. Each second records success and failure counts; the system computes failure rates over the configured window (e.g., 1 minute with a 10‑second sampling interval). Monitoring frequency and window size are tuned based on channel traffic characteristics to balance sample adequacy and detection latency.
Handling Low‑Traffic Channels
For channels with very low volume, if a single failure occurs within the current window, the window is expanded incrementally (up to 10×) before triggering an alert, ensuring that rare but critical issues are not ignored.
Final Effects
The system now provides fast channel‑exception alerts with precise localization, merges duplicate alerts, and supports manual channel up/down operations pending further algorithm refinement for full automation.
Future Plans
Continuously optimize monitoring algorithms to achieve >99% alert accuracy.
Integrate with the monitoring system to enable automatic channel shutdown on failures.
Implement automatic channel recovery detection and re‑enablement.
Author
Zhang Dan, R&D Engineer in ZhaiZhai Payment Settlement Technology Department, focusing on settlement system development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
