Setting Up and Using the MySQL CDC Connector with Apache Flink
This article provides a step‑by‑step guide on configuring the MySQL CDC connector for Flink, covering Maven and SQL client dependencies, MySQL user setup, connector options, table creation via SQL and Stream API, key features, common issues, and practical troubleshooting tips.
MySQL CDC connector enables Flink to read both snapshot and incremental data from a MySQL database. This guide translates the official documentation and explains how to configure the connector, prepare MySQL permissions, and run queries.
Dependencies
For Maven projects add:
<dependency>
<groupId>com.alibaba.ververica</groupId>
<artifactId>flink-connector-mysql-cdc</artifactId>
<version>1.1.0</version>
</dependency>Download flink-sql-connector-mysql-cdc-1.1.0.jar and place it in <FLINK_HOME>/lib/.
Setting Up MySQL Server
Create a MySQL user with the required privileges:
mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password'; mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password'; mysql> FLUSH PRIVILEGES;Important Notes
The connector obtains a global read lock (FLUSH TABLES WITH READ LOCK) during snapshot, then releases it.
If the MySQL user lacks RELOAD, the connector falls back to table‑level locks, which can block writes longer.
Global read lock may affect online traffic; you can skip it by setting debezium.snapshot.locking.mode='none' if at‑least‑once semantics are acceptable.
Each Flink job should use a distinct MySQL server‑id to avoid binlog position conflicts.
During table scans checkpoints cannot be taken; consider increasing checkpoint timeout and tolerable failures:
execution.checkpointing.interval: 10min
execution.checkpointing.tolerable-failed-checkpoints: 100
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 2147483647Adjust MySQL interactive_timeout and wait_timeout to prevent session timeouts while reading large tables.
Creating a MySQL CDC Table
SQL DDL example:
-- register a MySQL table 'orders' in Flink SQL
CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10,5),
product_id INT,
order_status BOOLEAN
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders'
);
SELECT * FROM orders;Stream API example:
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import com.alibaba.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import com.alibaba.ververica.cdc.connectors.mysql.MySQLSource;
public class MySqlBinlogSourceExample {
public static void main(String[] args) throws Exception {
SourceFunction<String> sourceFunction = MySQLSource.<String>builder()
.hostname("localhost")
.port(3306)
.databaseList("inventory") // monitor all tables under inventory database
.username("flinkuser")
.password("flinkpw")
.deserializer(new StringDebeziumDeserializationSchema()) // converts SourceRecord to String
.build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource(sourceFunction).print().setParallelism(1);
env.execute();
}
}Features
Exactly‑once processing: snapshot then continuous binlog reading with fault‑tolerance.
Single‑thread reading: only one task receives binlog events.
FAQ
Skip snapshot: set debezium.snapshot.mode='never' or schema_only' as needed.
Monitor multiple tables: use regex in table-name, e.g., user.*.
Binlog format errors: ensure binlog_format=ROW and no session changes.
Practical Issues Encountered
Dependency conflicts between different Kafka versions can cause CDC errors; see related issue links.
Timeouts during large snapshot reads can be mitigated by increasing wait_timeout and interactive_timeout in MySQL configuration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
