Big Data 15 min read

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

DataFunTalk
DataFunTalk
DataFunTalk
Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

Zhongyuan Bank, the only provincial‑level commercial bank in Henan, aims to become a technology‑driven and data‑driven bank, prompting the development of a real‑time financial data lake.

The data lake addresses the shift from traditional finance‑centric, offline analysis to customer‑centric, real‑time decision making, driven by the need for richer data, AI‑enabled insights, and rapid risk response.

Background : Traditional data warehouses excel at structured, batch‑oriented financial reporting but struggle with high‑velocity, multi‑type data, making them unsuitable for modern banking scenarios.

Architecture :

Functional layers: data sources, unified ingestion, storage, development, services, and applications.

Logical layers: storage (MPP warehouse + OSS/HDFS lake), compute (metadata services), service (federated queries, APIs), and product (RPA, identity, language analysis, profiling, recommendation).

Two real‑time scenarios: "direct" (Kafka → Flink → business) for T+0 decisions, and "landing" (Kafka → Flink → lake → downstream engines) for more complex processing.

Engineering Practice : Built on open‑source components (Kafka, Flink, Iceberg, HDFS/S3) with a one‑stop streaming development platform that offers visual development, task scheduling, multi‑tenant management, and unified operations.

Key Features of the data lake are:

Openness – supports structured, semi‑structured, and unstructured data, as well as AI workloads.

Timeliness – provides both real‑time (T+0) and near‑real‑time analytics.

Integration – unifies data warehouse and lake views, enabling seamless data sharing.

Results : Achieved T+0 latency, supports 20+ financial products, reduces storage cost by fivefold, processes 1.4 million risk events daily, and blocks 110 fraud incidents per day.

Use Cases :

Intelligent real‑time anti‑fraud – combines streaming, knowledge graph, and ML services to detect and prevent fraudulent transactions.

Real‑time BI – the "ZhiQiu" platform delivers interactive queries, customer insights, and visual analytics, serving over 10 000 monthly active users and handling 30 000+ daily BI requests.

The presentation concludes with a thank‑you and a call for community engagement.

Big DataFlinkstream processinganti-fraudfinancial analyticsreal-time data lake
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.