How a Bank Built a Real‑Time OLAP Pipeline with Flink, Kafka and StarRocks
This article details a Chinese bank’s journey to real‑time OLAP, covering the challenges of full‑link real‑time, the evolution from Kafka‑based ETL to Flink‑driven ELT, architecture components, practical deployment statistics, and future directions with Flink Table Store.
Background and Challenges
Real‑time OLAP is required for BI dashboards, AI feature calculation and fraud detection in banking. Key constraints include immutable accounting models, strict security, heterogeneous data sources (Oracle, MySQL, OceanBase, behavior‑tracking logs, network logs) and heavy reliance on dimension tables for reporting.
Evolution of the Real‑time Architecture
Stage 1 – Start : Kafka‑based real‑time ETL (capture → transform → load) supports fact‑table analytics but cannot query dimension tables, lacks data reuse and long‑term persistence.
Stage 2 – Exploration : Adopt ELT (load‑first, compute‑later). Initially use micro‑batch full loads, later switch to view‑based processing on MPP databases (StarRocks, ClickHouse). Improves latency but still incurs heavy query resource consumption.
Stage 3 – Optimization : Introduce Flink and Apache Paimon (Flink Table Store) to move part of the computation upstream, enabling combined batch‑stream processing for dimension‑table analytics.
Stage 4 – Future : Aim for a fully stream‑native data warehouse where all processing runs in Flink, achieving compute‑storage separation.
Current Architecture
Data sources (Oracle, MySQL, OceanBase, behavior‑tracking, network logs) → Kafka topics.
Real‑time computation: Flink SQL jobs running on YARN or Kubernetes, using Flink Table Store for mutable storage and Elasticsearch for dimension tables.
Outputs: online services (Oracle/MySQL) and analytical stores (StarRocks, Elasticsearch, Flink Table Store).
CDC tools: Attunity Replicate for Oracle, Flink CDC for MySQL, OceanBase OMS for future migration.
Operational Scale
Since 2018 the platform runs >380 real‑time jobs, processing >50 billion rows and >20 TB of data per day. Jobs range from simple log aggregation to complex CEP‑based user‑behavior analysis.
Typical ELT Flow
Oracle → Kafka → Flink CDC → StarRocks (ODS)
Offline batch → StarRocks (ADS)
StarRocks views combine ODS + ADS for ad‑hoc queriesReal‑time Deposit‑Loan ELT Example
Transactional (insert‑heavy) and customer attribute (update‑heavy) data are ingested via Kafka, processed by Flink, and stored in StarRocks. Dashboards provide instant asset‑liability views for branch managers.
Requirements for Real‑time Dimension‑Table Support
Store full data with fast updates (e.g., daily balance adjustments).
Support stream‑batch reads, especially low‑latency stream reads.
Maintain a complete changelog for correctness.
Flink Table Store (Apache Paimon) Integration
Flink Table Store provides a unified lakehouse storage that supports:
Dual‑write to data files and log system.
Stream‑batch write and read.
Fast UPDATE operations.
Compatibility with OLAP engines such as Hive and, in the future, StarRocks.
Re‑architected ELT pipeline:
Raw data → Flink Table Store (full history + updates)
Flink SQL computes aggregates
Aggregates → StarRocks (ADS)
Raw detail data remains in Table Store for ad‑hoc queriesThis reduces the load on StarRocks and enables minute‑level end‑to‑end latency for dimension‑table calculations.
Future Direction
The goal is a streaming data warehouse where both historical and real‑time data reside in Flink Table Store, providing a single SQL interface for batch and streaming queries. Remaining challenges include high resource consumption for complex aggregations and the need for sophisticated resource tuning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
