How Baidu’s Transaction Accounting System Handles Real‑Time Reconciliation
This article explains the design of Baidu's transaction accounting platform, covering business scenarios, the flow of transaction records, system architecture, real‑time data synchronization via Canal, Elasticsearch storage strategies, consistency guarantees, and aggregation techniques for accurate merchant financial reconciliation.
System Overview
The accounting subsystem, built on top of Baidu's transaction platform, aggregates revenue and expense streams from merchants, platforms, and hosts, providing daily, monthly, and yearly financial statements for each merchant.
Business Scenarios
Key scenarios include live‑stream commerce, mini‑program host sales, and platform‑level revenue sharing for services such as map‑based rides. Each order can generate multiple settlement records, which are classified into three categories: income (settlement shares), other items (technical service fees, mini‑program and host commissions, refunds), and expenditure (bank payouts).
Example Calculation
For a mini‑program sales order of ¥100, the flow is split as follows: 10% to the traffic host (¥10), 5% platform share (¥5), and a 0.6% technical service fee (¥0.6). The merchant receives ¥84.4, matching the total of income plus other items minus expenditure.
System Architecture
Data originates from the upstream fund‑pool stored in a DDBS database. Canal captures binlog changes and publishes them to a bigpipe message queue. The accounting service consumes these messages, validates and enriches them using Akka concurrency, then writes the complete records to Elasticsearch. Offline analytics are performed with Spark pulling data from ES to AFS.
Functional Breakdown
4.1 Real‑Time Data Sync via Canal
Canal monitors DDBS binlog, parses changes, and pushes them to bigpipe. The accounting service pulls messages, processes them concurrently with Akka, and achieves second‑level latency while smoothing traffic spikes.
4.2 Elasticsearch Storage
Because settlement records far outnumber orders (2‑6 records per order), a traditional sharded relational store is unsuitable. Elasticsearch provides multi‑dimensional, near‑real‑time queries. Initial routing used merchant IDs, causing shard skew; a migration using Logstash removed custom routing to balance shards.
4.3 Data Consistency Assurance
A consistency service records both Canal‑originated messages and successful ES writes in MySQL, compares upstream and downstream data daily, and invokes repair APIs for mismatches. The service retains seven days of messages and runs monthly Spark jobs for offline verification.
4.4 Data Aggregation
Merchant reconciliation pages query ES with keyword‑type fields for fast aggregation. Proper use of filter versus must contexts improves performance (filter is 2‑4× faster). Routing decisions affect shard distribution; early routing on merchant ID caused hot‑spot issues, later resolved by using default document IDs. Pagination strategies include from/size, scroll, and search_after, each suited to UI display, bulk export, or API batch retrieval.
Conclusion
The accounting system continuously evolves to support Baidu’s expanding transaction ecosystem, enhancing merchant reconciliation experience through robust real‑time pipelines, scalable storage, and rigorous consistency checks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
