Big Data 18 min read

How a Bank Built a Real‑Time OLAP Pipeline with Flink, Kafka and StarRocks

This article details a Chinese bank’s journey to real‑time OLAP, covering the challenges of full‑link real‑time, the evolution from Kafka‑based ETL to Flink‑driven ELT, architecture components, practical deployment statistics, and future directions with Flink Table Store.

dbaplus Community
dbaplus Community
dbaplus Community
How a Bank Built a Real‑Time OLAP Pipeline with Flink, Kafka and StarRocks

Background and Challenges

Real‑time OLAP is required for BI dashboards, AI feature calculation and fraud detection in banking. Key constraints include immutable accounting models, strict security, heterogeneous data sources (Oracle, MySQL, OceanBase, behavior‑tracking logs, network logs) and heavy reliance on dimension tables for reporting.

Evolution of the Real‑time Architecture

Stage 1 – Start : Kafka‑based real‑time ETL (capture → transform → load) supports fact‑table analytics but cannot query dimension tables, lacks data reuse and long‑term persistence.

Stage 2 – Exploration : Adopt ELT (load‑first, compute‑later). Initially use micro‑batch full loads, later switch to view‑based processing on MPP databases (StarRocks, ClickHouse). Improves latency but still incurs heavy query resource consumption.

Stage 3 – Optimization : Introduce Flink and Apache Paimon (Flink Table Store) to move part of the computation upstream, enabling combined batch‑stream processing for dimension‑table analytics.

Stage 4 – Future : Aim for a fully stream‑native data warehouse where all processing runs in Flink, achieving compute‑storage separation.

Current Architecture

Data sources (Oracle, MySQL, OceanBase, behavior‑tracking, network logs) → Kafka topics.

Real‑time computation: Flink SQL jobs running on YARN or Kubernetes, using Flink Table Store for mutable storage and Elasticsearch for dimension tables.

Outputs: online services (Oracle/MySQL) and analytical stores (StarRocks, Elasticsearch, Flink Table Store).

CDC tools: Attunity Replicate for Oracle, Flink CDC for MySQL, OceanBase OMS for future migration.

Operational Scale

Since 2018 the platform runs >380 real‑time jobs, processing >50 billion rows and >20 TB of data per day. Jobs range from simple log aggregation to complex CEP‑based user‑behavior analysis.

Typical ELT Flow

Oracle → Kafka → Flink CDC → StarRocks (ODS)
Offline batch → StarRocks (ADS)
StarRocks views combine ODS + ADS for ad‑hoc queries

Real‑time Deposit‑Loan ELT Example

Transactional (insert‑heavy) and customer attribute (update‑heavy) data are ingested via Kafka, processed by Flink, and stored in StarRocks. Dashboards provide instant asset‑liability views for branch managers.

Requirements for Real‑time Dimension‑Table Support

Store full data with fast updates (e.g., daily balance adjustments).

Support stream‑batch reads, especially low‑latency stream reads.

Maintain a complete changelog for correctness.

Flink Table Store (Apache Paimon) Integration

Flink Table Store provides a unified lakehouse storage that supports:

Dual‑write to data files and log system.

Stream‑batch write and read.

Fast UPDATE operations.

Compatibility with OLAP engines such as Hive and, in the future, StarRocks.

Re‑architected ELT pipeline:

Raw data → Flink Table Store (full history + updates)
Flink SQL computes aggregates
Aggregates → StarRocks (ADS)
Raw detail data remains in Table Store for ad‑hoc queries

This reduces the load on StarRocks and enables minute‑level end‑to‑end latency for dimension‑table calculations.

Future Direction

The goal is a streaming data warehouse where both historical and real‑time data reside in Flink Table Store, providing a single SQL interface for batch and streaming queries. Remaining challenges include high resource consumption for complex aggregations and the need for sophisticated resource tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkStarRocksbanking analyticsstreaming ELT
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.