Big Data 13 min read

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

This article details a financial company's exploration of Apache Flink for real‑time processing, covering its unique business constraints, end‑to‑end data pipeline, single‑table and multi‑table use cases, implementation challenges, code snippets, data initialization, testing strategies, and lessons learned.

dbaplus Community

Apr 17, 2021

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

Business Background

Traditional financial institutions process form‑based business records that change frequently, have long update cycles (days to months), require recomputation of historical statistics when dimension tables change, and demand high monetary accuracy.

Real‑time Data Processing Flow

Data are captured via Canal (MySQL binlog) and Flume, written to Kafka. Flink SQL defines source and sink tables, processes streams, and writes results to MySQL, Elasticsearch, HBase, or back to Kafka for downstream jobs.

Two data sources: business data (MySQL → Canal → Kafka) and log data (Flume → Kafka).

Define Flink SQL source/sink tables linking Kafka topics to target stores.

Optionally upload compiled Flink JARs to the platform.

Create and run Flink jobs on YARN.

Sink final results to MySQL/ES/HBase or Kafka.

Use Case 1: Single‑Table Real‑time Calculation

Goal: From a repayment‑plan table compute per‑order metrics such as first‑month payment, first repayment date, first overdue date, current period, paid amount and remaining amount.

Implementation uses Flink DataStream API. Key steps:

public static void main(String[] args) throws Exception {
    // Execution environment
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // Checkpoint every 10 min
    env.enableCheckpointing(600_000);
    // RocksDB state backend
    String chkpointPath = "hdfs://ha/xxx/checkpoint";
    StateBackend backend = new RocksDBStateBackend(chkpointPath);
    env.setStateBackend(backend);
    // Event‑time
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

    // Kafka consumer
    Properties props = new Properties();
    props.setProperty("bootstrap.servers", kafkaIPs);
    props.setProperty("group.id", "group.cyb.1");
    FlinkKafkaConsumer<KafkaData<List<CustPlanSource>>> consumer =
        new FlinkKafkaConsumer<>(topic, new KafkaDeserializationScheme(), props);
    consumer.assignTimestampsAndWatermarks(new PeriodicTsAndWmarks());

    // Start from a given timestamp
    long startTs = Long.parseLong(args[0]);
    consumer.setStartFromTimestamp(startTs);

    // Pipeline
    SingleOutputStreamOperator<ResultInfo> ds = env
        .addSource(consumer)
        .filter(f -> f.getTable().equalsIgnoreCase("plan_cust") && !f.getIsDdl())
        .flatMap(new DataFlatMap())
        .keyBy(RepayPlanCustOut::getApply_no)
        .window(TumblingEventTimeWindows.of(Time.seconds(20)))
        .process(new CustProcessWindowFun())
        .returns(new TypeHint<ResultInfo>(){})
        .setParallelism(15);

    ds.writeUsingOutputFormat(new SinkOutFormat()).setParallelism(15);
    env.execute("planMain-1");
}

The pipeline keeps the latest binlog for each order in Flink state, merges it with HBase detail tables when windows close, and writes aggregated results back to an HBase result table.

Data Initialization and Testing

Historical MySQL data are first synchronized to HBase using Hive external tables. The same Hive scripts generate expected statistical results, which are then compared with the Flink output via a Python verification script.

Use Case 2: Multi‑Table Join Real‑time Calculation

Four solution attempts were evaluated to join a main table with several dimension tables (contract, customer, product, etc.) in real time.

Solution 1 (Flink 1.5 + HBase dimension tables) : Sync each dimension table to HBase and join in Flink. Simple but introduces latency because dimension updates must propagate to HBase first.

Solution 2 (Flink 1.10 + MySQL dimension tables) : Use MySQL directly as dimension tables, eliminating HBase lag. However, updates to dimension tables do not trigger recomputation of the join.

Solution 3 (All tables in Kafka) : Stream every table into Kafka, let Flink read the main table and join with dimension tables stored in MySQL. Any table update triggers recomputation, but the result table is rewritten entirely each time, causing low efficiency.

Solution 4 (Result table as incremental dimension) : Keep the result table as a dimension table and update only the fields that changed per task. This reduces write overhead while still supporting real‑time updates from any source.

The experiments illustrate trade‑offs among simplicity, latency, and write efficiency when applying Flink to traditional finance workloads.

Conclusion

Flink provides a fast, event‑time, stateful stream‑processing platform that integrates well with Kafka, MySQL, HBase and Elasticsearch. In financial scenarios with long update cycles and strict accuracy requirements, practical challenges such as state management, checkpointing, and triggering on dimension‑table updates require deep Flink expertise and careful architectural design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time Flink Kafka MySQL HBase Financial

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Business Background

Real‑time Data Processing Flow

Use Case 1: Single‑Table Real‑time Calculation

Data Initialization and Testing

Use Case 2: Multi‑Table Join Real‑time Calculation

Conclusion

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

Use Case 1: Single‑Table Real‑time Calculation

Use Case 2: Multi‑Table Join Real‑time Calculation