How to Build a Scalable Payment Reconciliation System for Tens of Millions of Daily Transactions
This article explains the design of a high‑throughput payment reconciliation system that handles daily transaction volumes in the tens of millions, covering architecture, data collection, big‑data processing with Hive and Spark, handling of suspicious records, and integration with a data platform.
Payment Reconciliation Overview
Payment reconciliation is the process of matching payment records generated by a merchant with those provided by a third‑party payment channel to ensure consistency and detect anomalies.
Why Reconciliation Is Needed
It serves as the final safety net for payment systems, allowing early detection of mismatched or missing transactions caused by network issues, system bugs, or day‑cut discrepancies.
System Architecture
The system consists of two main modules: the Reconciliation Module (responsible for pulling files, parsing data, matching records, and aggregating results) and the Error Module (handling unmatched records and generating error orders for manual review).
Challenges at Ten‑Million‑Level Data
Query efficiency degrades as daily data grows, leading to long reconciliation times.
Memory consumption can cause OOM errors when loading both local and remote data into memory.
Single‑machine processing limits scalability and wastes resources.
Solution Using Big Data Technologies
Traditional OLTP databases like MySQL are unsuitable for massive, batch‑oriented reconciliation tasks. Instead, the system leverages OLAP platforms such as Hive and Spark SQL to process large datasets efficiently.
Data Platform (DP) Integration
The DP provides offline task scheduling, data synchronization between MySQL and Hive, and supports SQL‑based data cleaning, transformation, and aggregation.
System Workflow Overview
The workflow is driven by a state machine, with each stage triggered by scheduled tasks. Key stages include initialization, data collection, suspicious data handling, DP import, DP verification, second‑level suspicious processing, data summarization, and error pushing.
1. Initialization
Each night a reconciliation task record is created based on active rules (channel_code, biz_type, status). The batch number is generated as bill_date + channel_code + "001". The task progresses through phases such as "Initialization", "Data Collection", "Suspicious Handling", "Data Verification", etc.
2. Data Collection
Data is gathered from two sources:
Local data : payment records from the merchant's own databases, captured via binlog listeners and stored in a table with fields like channel_code, biz_order_no, bill_date, status, is_check, trade_amount, channel_order_no, merchant_no, sub_merchant_no.
Remote data : third‑party channel files downloaded via FTP/SFTP/HTTP, parsed, and stored in a similar table with an additional channel_fee field.
3. Suspicious Data Handling
Records that cannot be matched (e.g., day‑cut mismatches, amount differences, missing counterpart) are stored in a "suspicious" table with fields such as batch_no, biz_id, biz_amount, status, biz_date, biz_type, channel_code, delayed_times, buffer_type (0 for local, 1 for channel).
4. DP Import
After local and suspicious data are ready, a scheduled DP job imports the MySQL tables into Hive for batch verification.
5. DP Verification
Two Hive jobs run:
Successful match : inner join on biz_order_no, trade_amount, and channel_code to create dp.pay_check_success.
CREATE TABLE IF NOT EXISTS dp.pay_check_success (...);
INSERT OVERWRITE TABLE dp.pay_check_success SELECT ... FROM dp.pay_check_record tb1 INNER JOIN dp.pay_check_channel_record tb2 ON ... WHERE tb1.is_check=0 AND tb2.is_check=0 ...;Suspicious match : left joins to find records present on only one side, inserting them into dp.check_dp_buffer_record with an order_type flag (0 for local, 1 for channel).
CREATE TABLE IF NOT EXISTS dp.check_dp_buffer_record (...);
INSERT OVERWRITE TABLE dp.check_dp_buffer_record SELECT ... FROM ods.pay_check_record LEFT JOIN ods.pay_check_channel_record ON ... WHERE tb2.biz_order_no IS NULL UNION SELECT ... FROM ods.pay_check_chnnel_bill LEFT JOIN ods.pay_check_record ON ... WHERE tb2.biz_order_no IS NULL;6. Second‑Level Suspicious Processing
The buffer table is scanned; for each entry the system looks for a counterpart of opposite order_type. If a matching record with a different amount is found, it is moved to the error table; otherwise it remains as genuine suspicious data.
7. Data Summarization
Spark jobs aggregate counts of successful, suspicious, and error records for daily dashboards.
8. Error Data Push
Unresolved error records are sent via NSQ messages to a dedicated error‑handling system.
Conclusion
Building a ten‑million‑level payment reconciliation system requires a robust big‑data platform, careful separation of batch and real‑time processing, and systematic handling of mismatches. Integrating the reconciliation service with a data platform like DP enables scalable, reliable, and maintainable operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
