Big Data 8 min read

Using Flinkx for Data Synchronization in Sharded MySQL Environments

This article explains how to leverage Flinkx and Flink Stream API to create a unified data‑sync task that extracts data from sharded MySQL tables, splits the workload, and pushes it to an MQ cluster, while detailing the underlying InputFormat and Reader architecture.

Full-Stack Internet Architecture

Dec 20, 2020

Using Flinkx for Data Synchronization in Sharded MySQL Environments

1. Scenario Description

The example shows an order system that has been partitioned into multiple databases and tables (four databases, eight tables). The requirement is to create a single task that synchronizes data to an MQ cluster instead of creating separate tasks for each database instance, since the table structures and mapping rules are identical.

2. Flinkx Solution Details

2.1 Flink Stream API Development Process

The general steps for programming with Flink Stream API are illustrated in the diagram below.

Note: Detailed Stream API content will be covered in future articles; this article focuses on InputFormatSourceFunction and data source splitting.

2.2 Flinkx Reader (Data Source) Core Class Diagram

The core class hierarchy of Flinkx Readers is shown below, with BaseDataReader as the base class.

Key classes include:

InputFormat : Core Flink API for splitting and reading input data. Important methods: configure, getStatistics, createInputSplits, getInputSplitAssigner. void open(T split): Opens a data channel for a given split; examples for JDBC and Elasticsearch are shown. boolean reachedEnd(): Indicates whether the data source has been exhausted (bounded data). OT nextRecord(OT reuse): Retrieves the next record from the channel. void close(): Closes the source.

InputSplit : Root interface for data partitions, providing int getSplitNumber().

Implementation example: GenericInputSplit with fields partitionNumber and totalNumberOfPartitions, useful for modulo‑based splitting of large tables.

Other related interfaces: SourceFunction, RichFunction, ParallelSourceFunction, RichParallelSourceFunction, InputFormatSourceFunction, and BaseDataReader.

2.3 Building a DataStream with Flinkx Reader

After understanding the class diagram, the article demonstrates the read flow of DistributedJdbcDataReader (a subclass of BaseDataReader). The process creates an InputFormat, then a corresponding SourceFunction, and finally adds the source to the StreamExecutionEnvironment to obtain a DataStreamSource.

2.4 Flinkx Solution for Sharded Database Task Splitting

Given the scenario of a four‑database, eight‑table order system, performance can be improved by:

Splitting by database and table, resulting in eight independent tasks.

Further splitting each table's data, e.g., using id % totalNumberOfPartitions = partitionNumber for modulo‑based distribution.

Flinkx follows this strategy. The workflow is illustrated below.

Step 1: Split by database instance and table, forming a DataSource list.

Step 2: Implement the actual split logic inside DistributedJdbcInputFormat#createInputSplitsInternal.

Step 3: If a splitKey is specified, generate SQL where clauses such as splitKey % totalNumberOfPartitions = partitionNumber to achieve parallel extraction.

Step 4: If no table‑level split key is provided, the algorithm splits the sourceList itself, distributing tables among partitions.

The discussion of task splitting in Flinkx ends here.

3. Conclusion

This article introduced how to use Flinkx to split data extraction tasks for MySQL sharding scenarios, covering basic Flink programming patterns, the InputFormat and SourceFunction class hierarchy, and practical splitting strategies.

Note: Detailed Flink API analysis will be covered in future articles; the current series does not follow a strict sequential order.

Thank you for reading – likes, comments, and shares are greatly appreciated.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink stream processing sharding mysql data synchronization FlinkX

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.