Backend Development 32 min read

Designing a Daily Million-Transaction Payment Reconciliation System

This article explains how to architect a payment reconciliation system that can reliably process tens of millions of transactions per day, covering the underlying logic, scalability challenges, data collection methods, big‑data integration, and step‑by‑step processing flows to ensure accurate financial matching.

IT Architects Alliance

Feb 8, 2022

Designing a Daily Million-Transaction Payment Reconciliation System

What Is Payment Reconciliation?

Payment reconciliation is the process of matching transaction records generated by a merchant’s system with those provided by a third‑party payment channel to ensure both sides record the same amount and status for each order.

Why Reconciliation Is Critical

In normal operation the two sides are consistent, but network glitches, timing differences, or system bugs can cause mismatches. Reconciliation acts as the final safety net, allowing early detection of discrepancies before they accumulate into large accounting errors.

Overall System Architecture

The system consists of two main modules:

Reconciliation Module – pulls files, parses data, performs matching, and aggregates results.

Error Module – receives records that could not be matched and generates error orders for manual handling.

Reconciliation system architecture diagram

Challenges at Ten‑Million‑Level Scale

Query Efficiency – Full‑table scans become slow as daily data grows.

Memory Consumption (OOM) – Loading both local and remote datasets into memory can exhaust JVM heap.

Performance & Scalability – Single‑machine processing cannot keep up with the volume, leading to resource waste.

Moving to a Big‑Data Solution

Because the workload is batch‑oriented and read‑heavy, the traditional OLTP‑optimized MySQL is unsuitable. The design switches to a big‑data platform (Hive/Spark) for the heavy‑lifting while keeping MySQL for transactional storage.

Data‑Flow Overview

The process is driven by a state‑machine‑controlled series of scheduled tasks:

Initialize daily reconciliation tasks.

Collect data from both sides.

Handle suspicious (unmatched) records.

Perform matching in the data platform.

Aggregate results.

Push error records to the error‑handling system.

Task Initialization

Each night a job creates a reconciliation task record for every active channel‑business rule combination. Important fields include channel_code, biz_type, bill_date, batch_no, and phase (e.g., INIT, DATA_COLLECTION, SUSPICIOUS_HANDLING, MATCHING, SECOND_SUSPICIOUS, AGGREGATION, ERROR_PUSH).

Data Collection

Local Data – Generated by the business, captured via binlog listeners and inserted into a dedicated table. Key columns: channel_code, biz_order_no, bill_date, status, is_check, trade_amount, channel_order_no, merchant_no, sub_merchant_no.

Remote Data – Downloaded from each payment channel using configurable adapters (FTP, SFTP, HTTP). After download, files are parsed, normalized, and stored in a similar table with an extra channel_fee column.

Suspicious Data Handling

Suspicious data (存疑) appears when a record cannot be matched in the current day's batch. Types include:

Local‑only (possible day‑cut issue).

Remote‑only (e.g., test data leaking into production).

Amount mismatch (same order number but different amounts).

The system stores these in a suspicious_record table with fields such as batch_no, biz_id, biz_amount, status, biz_date, buffer_type (0 = local, 1 = remote), and delayed_times. Daily processing attempts to resolve them; if they remain unresolved beyond a channel‑specific threshold they are moved to the error table.

Integration with the Data Platform (DP)

After the local suspicious‑handling step, the collected MySQL data is imported into Hive tables. The DP scheduler detects when the reconciliation task reaches the “ready for DP” phase and triggers the import.

DP Matching Jobs

Success Matching – Finds records where biz_order_no, trade_amount, and channel_code are identical on both sides.

CREATE TABLE IF NOT EXISTS dp.pay_check_success (
  batch_no BIGINT COMMENT 'batch number',
  merchant_no STRING COMMENT 'third‑party merchant',
  sub_merchant_no STRING COMMENT 'sub‑merchant',
  biz_id STRING COMMENT 'order id',
  biz_amount BIGINT COMMENT 'amount',
  biz_date STRING COMMENT 'business date',
  biz_type INT COMMENT 'business type',
  status INT COMMENT 'status flag',
  remark STRING COMMENT 'remarks',
  create_time STRING COMMENT 'creation time',
  update_time STRING COMMENT 'update time',
  trade_date INT COMMENT 'trade date',
  channel_code INT COMMENT 'channel identifier'
);

INSERT OVERWRITE TABLE dp.pay_check_success
SELECT tb1.batch_no, tb1.merchant_no, tb1.sub_merchant_no, tb1.biz_order_no AS biz_id,
       tb1.trader_amount AS biz_amount, '${DP_1_DAYS_AGO_Ymd}' AS biz_date,
       0 AS status, '' AS remark, '${DP_1_DAYS_AGO_Ymd}' AS trade_date,
       tb1.channel_code
FROM dp.pay_check_record tb1
JOIN dp.pay_check_channel_record tb2
  ON tb1.biz_order_no = tb2.biz_order_no
 AND tb1.trader_amount = tb2.trader_amount
 AND tb1.channel_code = tb2.channel_code
WHERE tb1.is_check = 0 AND tb2.is_check = 0
  AND tb1.bill_date = '${DP_1_DAYS_AGO_Ymd}'
  AND tb2.bill_date = '${DP_1_DAYS_AGO_Ymd}'
  AND tb1.is_filter = 0;

Suspicious Matching – Detects records that are present on only one side or have amount mismatches, writing them to check_dp_buffer_record for further processing.

CREATE TABLE IF NOT EXISTS dp.check_dp_buffer_record (
  biz_id STRING COMMENT 'order id',
  order_type INT COMMENT '0=local,1=remote',
  bill_date INT COMMENT 'reconciliation date',
  biz_type INT COMMENT 'business type',
  channel_code INT COMMENT 'channel id',
  amount STRING COMMENT 'amount',
  merchant_no STRING COMMENT 'merchant',
  sub_merchant_no STRING COMMENT 'sub‑merchant',
  trade_date INT COMMENT 'trade date',
  create_time STRING,
  update_time STRING
);
INSERT OVERWRITE TABLE dp.check_dp_buffer_record
SELECT ... -- omitted for brevity, follows the same left‑join logic described in the article

Second‑Pass Suspicious Processing

After DP returns the buffer records, the system iterates over them to find opposite‑direction entries (e.g., a local‑only record paired with a remote‑only record having the same order number). If amounts differ, the record is classified as an error; otherwise it is marked as resolved.

Aggregation and Error Push

Aggregated statistics (successful matches, remaining suspicious, error counts) are computed using Spark jobs and displayed on an operational dashboard. Finally, error records are sent to the separate error‑handling system via NSQ messages.

Conclusion

The core difficulty of a ten‑million‑daily reconciliation pipeline lies not in the matching logic itself but in building a robust big‑data infrastructure and integrating it with the existing transactional system. By offloading heavy batch work to Hive/Spark and keeping a lightweight state machine in MySQL, the design achieves both scalability and operational simplicity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend-architecture Big Data data pipeline Hive Spark payment reconciliation

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.