Design and Implementation of an Automated Payment Channel Management System

This article describes the design, technology choices, architecture, and implementation details of an automated payment channel management system that uses Redis‑based time‑series storage, custom circuit‑breaker logic, and monitoring to achieve fast fault detection, accurate alerting, and future automated failover.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Design and Implementation of an Automated Payment Channel Management System

Background

To meet growing business demand, ZhaiZhai integrated many payment channels whose third‑party stability varied, causing frequent channel failures. Detecting anomalies was delayed, relying on alerts or user feedback. As the core system handling all company payments, a fully automated payment channel management system was needed.

Design Goals

Multi‑channel, multi‑entity monitoring capability.

Rapid fault detection and root‑cause localization.

Minimize false positives and false negatives.

Automated channel failover capability.

Technical Selection

For circuit‑breaker functionality, existing solutions like Hystrix were insufficient because they operate at the interface level and cannot handle channel‑ or merchant‑level degradation, nor allow custom traffic probing during failover.

For time‑series storage, after evaluating popular options, the team chose between Prometheus and a custom Redis‑based solution. Prometheus sacrifices some accuracy for reliability and simplicity, which is unsuitable for high‑sensitivity channel switching. Redis, familiar to Java backend developers, offered lower learning and maintenance costs.

Architecture Design

Payment requests flow through channel routing to select an available channel, then invoke the gateway for order placement or payout. Results are reported via MQ to the monitoring system, which stores data in Redis. A computation module aggregates failure rates and triggers alerts based on configured rules. Data is periodically backed up to MySQL, and visualized in Grafana via Prometheus.

Implementation Details

Data Structure

The Redis storage mimics time‑series concepts using sets, sorted sets, and hashes, as shown in the comparison with InfluxDB.

1. set
存储已统计的维度,具体到商户号
key: routeAlarm:alarmitems
value: 微信-打款-100000111
       微信-打款-100000112
       微信-打款-100000113
       .......

2. zset
存储指定商户号请求的时间戳(秒),同一秒的数据会覆盖存储
key: routeAlarm:alarmitem:timeStore:微信-打款-100000111
      score: 1657164225 value: 1657164225
      score: 1657164226 value: 1657164226
      score: 1657164227 value: 1657164227
      .......

3. hash
存储指定商户号1秒内的请求结果, 每秒汇总一份结果
key: routeAlarm:alarmitem:fieldStore:微信-打款-100000111:1657164225
      key: success          value: 10 (次数)
      key: fail             value: 5
      key: balance_not_enough value: 3
      key: thrid_error      value: 2
      .......

Core Algorithm

The algorithm combines a local counting method with a sliding window to avoid missing short spikes. Each second records success and failure counts; the system computes failure rates over the configured window (e.g., 1 minute with a 10‑second sampling interval). Monitoring frequency and window size are tuned based on channel traffic characteristics to balance sample adequacy and detection latency.

Handling Low‑Traffic Channels

For channels with very low volume, if a single failure occurs within the current window, the window is expanded incrementally (up to 10×) before triggering an alert, ensuring that rare but critical issues are not ignored.

Final Effects

The system now provides fast channel‑exception alerts with precise localization, merges duplicate alerts, and supports manual channel up/down operations pending further algorithm refinement for full automation.

Future Plans

Continuously optimize monitoring algorithms to achieve >99% alert accuracy.

Integrate with the monitoring system to enable automatic channel shutdown on failures.

Implement automatic channel recovery detection and re‑enablement.

Author

Zhang Dan, R&D Engineer in ZhaiZhai Payment Settlement Technology Department, focusing on settlement system development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendmonitoringredisfault tolerancepaymentcircuit breaker
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.