Backend Development 14 min read

How Baidu’s Transaction Accounting System Handles Real‑Time Reconciliation

This article explains the design of Baidu's transaction accounting platform, covering business scenarios, the flow of transaction records, system architecture, real‑time data synchronization via Canal, Elasticsearch storage strategies, consistency guarantees, and aggregation techniques for accurate merchant financial reconciliation.

Baidu Geek Talk

Mar 16, 2022

How Baidu’s Transaction Accounting System Handles Real‑Time Reconciliation

System Overview

The accounting subsystem, built on top of Baidu's transaction platform, aggregates revenue and expense streams from merchants, platforms, and hosts, providing daily, monthly, and yearly financial statements for each merchant.

Business Scenarios

Key scenarios include live‑stream commerce, mini‑program host sales, and platform‑level revenue sharing for services such as map‑based rides. Each order can generate multiple settlement records, which are classified into three categories: income (settlement shares), other items (technical service fees, mini‑program and host commissions, refunds), and expenditure (bank payouts).

Example Calculation

For a mini‑program sales order of ¥100, the flow is split as follows: 10% to the traffic host (¥10), 5% platform share (¥5), and a 0.6% technical service fee (¥0.6). The merchant receives ¥84.4, matching the total of income plus other items minus expenditure.

System Architecture

Data originates from the upstream fund‑pool stored in a DDBS database. Canal captures binlog changes and publishes them to a bigpipe message queue. The accounting service consumes these messages, validates and enriches them using Akka concurrency, then writes the complete records to Elasticsearch. Offline analytics are performed with Spark pulling data from ES to AFS.

Functional Breakdown

4.1 Real‑Time Data Sync via Canal

Canal monitors DDBS binlog, parses changes, and pushes them to bigpipe. The accounting service pulls messages, processes them concurrently with Akka, and achieves second‑level latency while smoothing traffic spikes.

4.2 Elasticsearch Storage

Because settlement records far outnumber orders (2‑6 records per order), a traditional sharded relational store is unsuitable. Elasticsearch provides multi‑dimensional, near‑real‑time queries. Initial routing used merchant IDs, causing shard skew; a migration using Logstash removed custom routing to balance shards.

4.3 Data Consistency Assurance

A consistency service records both Canal‑originated messages and successful ES writes in MySQL, compares upstream and downstream data daily, and invokes repair APIs for mismatches. The service retains seven days of messages and runs monthly Spark jobs for offline verification.

4.4 Data Aggregation

Merchant reconciliation pages query ES with keyword‑type fields for fast aggregation. Proper use of filter versus must contexts improves performance (filter is 2‑4× faster). Routing decisions affect shard distribution; early routing on merchant ID caused hot‑spot issues, later resolved by using default document IDs. Pagination strategies include from/size, scroll, and search_after, each suited to UI display, bulk export, or API batch retrieval.

Conclusion

The accounting system continuously evolves to support Baidu’s expanding transaction ecosystem, enhancing merchant reconciliation experience through robust real‑time pipelines, scalable storage, and rigorous consistency checks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture Elasticsearch Canal financial reconciliation real-time-sync transaction accounting

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.