Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank
This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.
Zhongyuan Bank, the only provincial‑level commercial bank in Henan, aims to become a technology‑driven and data‑driven bank, prompting the development of a real‑time financial data lake.
The data lake addresses the shift from traditional finance‑centric, offline analysis to customer‑centric, real‑time decision making, driven by the need for richer data, AI‑enabled insights, and rapid risk response.
Background : Traditional data warehouses excel at structured, batch‑oriented financial reporting but struggle with high‑velocity, multi‑type data, making them unsuitable for modern banking scenarios.
Architecture :
Functional layers: data sources, unified ingestion, storage, development, services, and applications.
Logical layers: storage (MPP warehouse + OSS/HDFS lake), compute (metadata services), service (federated queries, APIs), and product (RPA, identity, language analysis, profiling, recommendation).
Two real‑time scenarios: "direct" (Kafka → Flink → business) for T+0 decisions, and "landing" (Kafka → Flink → lake → downstream engines) for more complex processing.
Engineering Practice : Built on open‑source components (Kafka, Flink, Iceberg, HDFS/S3) with a one‑stop streaming development platform that offers visual development, task scheduling, multi‑tenant management, and unified operations.
Key Features of the data lake are:
Openness – supports structured, semi‑structured, and unstructured data, as well as AI workloads.
Timeliness – provides both real‑time (T+0) and near‑real‑time analytics.
Integration – unifies data warehouse and lake views, enabling seamless data sharing.
Results : Achieved T+0 latency, supports 20+ financial products, reduces storage cost by fivefold, processes 1.4 million risk events daily, and blocks 110 fraud incidents per day.
Use Cases :
Intelligent real‑time anti‑fraud – combines streaming, knowledge graph, and ML services to detect and prevent fraudulent transactions.
Real‑time BI – the "ZhiQiu" platform delivers interactive queries, customer insights, and visual analytics, serving over 10 000 monthly active users and handling 30 000+ daily BI requests.
The presentation concludes with a thank‑you and a call for community engagement.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.