Big Data 11 min read

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This blog series introduces three powerful Flink use‑cases, focusing on dynamic application logic updates, dynamic data partitioning (shuffle) at runtime, and low‑latency alerts using custom window logic without the standard Window API. The first case study demonstrates a real‑world fraud‑detection engine.

The demo consists of three components: a React front‑end, a SpringBoot back‑end, and the Flink fraud‑detection application. The back‑end exposes REST APIs for rule management, sends control commands to the Control Kafka topic, streams simulated transactions to the Transactions topic, and forwards generated alerts to the Alerts topic, which are then pushed to the UI via WebSockets.

Dynamic data partitioning replaces the static keyBy selector with a runtime‑configurable approach. Instead of hard‑coding a KeySelector, the system extracts grouping keys from rule definitions, allowing rules such as “if user A pays user B more than $1,000,000 within a week, trigger an alert” to be evaluated dynamically.

Rule parameters are expressed in JSON, for example:

{
  "ruleId": 1,
  "ruleState": "ACTIVE",
  "groupingKeyNames": ["payerId", "beneficiaryId"],
  "aggregateFieldName": "paymentAmount",
  "aggregatorFunctionType": "SUM",
  "limitOperatorType": "GREATER",
  "limit": 1000000,
  "windowMinutes": 10080
}

The DynamicKeyFunction iterates over all active rules, extracts the required grouping keys via reflection, and emits a Keyed wrapper containing the original event, the computed key string (e.g., "{payerId=25;beneficiaryId=12}"), and the rule ID.

public class DynamicKeyFunction extends ProcessFunction<Transaction, Keyed<Transaction, String, Integer>> {
    // Simplified example
    List<Rule> rules = /* initialized elsewhere */;
    @Override
    public void processElement(Transaction event, Context ctx, Collector<Keyed<Transaction, String, Integer>> out) {
        for (Rule rule : rules) {
            out.collect(new Keyed<>(event, KeysExtractor.getKey(rule.getGroupingKeyNames(), event), rule.getRuleId()));
        }
    }
}

The Keyed POJO holds three fields: wrapped (original transaction), key (the dynamic grouping key), and id (the rule identifier). Downstream, a simple lambda can be used to keyBy the dynamic key:

DataStream<Alert> alerts =
    transactions
        .process(new DynamicKeyFunction())
        .keyBy((keyed) -> keyed.getKey())
        .process(new DynamicAlertFunction());

This approach implicitly duplicates events so that each rule can be evaluated in parallel, providing horizontal scalability. Adding more Flink task slots increases rule‑processing capacity, while the trade‑off is increased network and memory usage due to event duplication.

The article concludes by summarizing the dynamic partitioning technique and hints at upcoming posts that will cover dynamic rule broadcasting and the implementation of the DynamicAlertFunction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

stream processingfraud detectionReal-time analyticsApache FlinkDynamic PartitioningKeyBy
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.