Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance
Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.
Background and Challenges
In the rapid development of financial services, user operations, risk control distribution, and product design increasingly rely on data. To meet real‑time analysis, quantitative operations, risk modeling, and machine‑learning requirements, a high‑capacity, low‑latency, and easy‑to‑maintain data storage and analysis system is needed.
Limitations of Existing Solutions
Data synchronization via DBIO to a MySQL cluster using the TokuDB engine provided compression and archival capabilities, but suffered from performance bottlenecks (complex queries took tens of minutes), data silos across teams, and high storage costs (thousands of CNY per TB per month).
Data lake synchronization with BabelX on a T‑1 schedule introduced data latency, unfriendly schema changes, and difficulty back‑filling historical data, while maintaining two parallel pipelines increased maintenance overhead.
Solution Exploration: RCP SQL + Iceberg v2
In early 2024, we trialed a Flink SQL + Iceberg v2 solution on the Realtime Compute Platform (RCP) to handle order‑state changes in settlement warehouses. Full‑snapshot sync ensured data completeness but was slow, while end‑state sync reduced data volume but lost intermediate order states. Iceberg v2’s row‑level change feature (delete files) solved these issues, allowing the data lake to act as a read‑only replica with stable operation for over a year.
Research: Is OceanBase HTAP the Silver Bullet?
In Q3 2024 we evaluated OceanBase HTAP for transactional workloads, attracted by its horizontal scalability, high read/write performance, and cost‑effective storage. However, challenges remained: data islands persisted for stream‑batch hybrid workloads, and overall storage cost was higher than the combined online‑DB + data‑lake approach.
Final Selection: BabelX Real‑Time + Paimon
In October 2024 we adopted BabelX Real‑Time (a Flink CDC‑based data sync tool) together with Paimon. With the help of the big‑data team we resolved Paimon’s small‑file issue and added schema‑evolution support. Six tasks now synchronize over 20 business tables, running stably for more than six months.
The new solution simplifies field mapping, uses regex for sharding, automates target‑table creation and schema evolution, and provides a visual monitoring UI, dramatically reducing human effort and operational complexity.
Figure 1: Simplified overall pipeline after the change
Figure 2: CPU utilization remains low when a single task syncs multiple tables
Solution Comparison
We evaluated the candidates across five key dimensions—query performance, data freshness, data interoperability, storage cost, and maintenance cost. The final selection outperformed alternatives on all metrics.
Value Realization
Cost Reduction & Efficiency : Streamlined sync tasks cut operational overhead.
Performance Gains : Data latency improved from T+1 to minute‑level; complex analytical queries reduced from tens of minutes to 1‑3 minutes (10‑30× speedup); concurrent queries support >10 without degradation.
Data Governance : Eliminated data silos, optimized cold‑data archiving, and increased data quality by 90% through automated schema management.
Business Enablement : Real‑time data fuels risk control, anti‑fraud, and machine‑learning pipelines; quantitative operations benefit from unified user‑behavior data, driving growth.
Future Plans
Real‑time user‑tag generation to enhance financial marketing.
Full‑stack monitoring for end‑to‑end observability.
Report migration to accelerate data delivery.
Self‑service data platform with chat‑BI to lower business‑side barriers.
Usage Reference
For further details on BabelX Real‑Time, refer to the Flink CDC official documentation.
Engine Configuration
Increase checkpoint timeout for large tables: execution.checkpointing.timeout=30min.
Adjust checkpoint interval to accept minute‑level latency and reduce small files.
Scale compute resources during full sync, then downsize after incremental sync based on CPU utilization.
Assign dedicated queues for BabelX and RCP tasks to ensure resource isolation.
MySQL Source Configuration
Enable dynamic table addition: scan.newly-added-table.enabled=true.
For regex‑based partitioned tables, auto‑sync new tables: scan.binlog.newly-added-table.enabled=true (mutually exclusive with the above).
Enable schema‑evolution for pt‑osc DDL: scan.parse.online.schema.changes.enabled=true (experimental).
Skip specific upstream operations: debezium.skipped.operations=[c,u,d,t] (e.g., ignore delete and truncate).
Paimon Sink Configuration
Change commit.user when restarting to ensure data commits.
Set table.properties.bucket based on table size and partition count to avoid small files.
Automatic table creation from upstream schema; explicit table creation not required.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
