Flink 1.19 New Features: SQL Optimizations, Runtime Enhancements, and Checkpointing Improvements
The article reviews Flink 1.19’s new features, highlighting SQL capability enhancements such as custom source parallelism, TTL hints, and MiniBatch support for regular joins, as well as runtime dynamic parallelism for batch jobs and flexible checkpointing intervals for different data sources.
SQL Capability Optimization
SQL capability optimizations focus on three features: source table custom parallelism, SQL hint TTL configuration, and Regular Join MiniBatch optimization.
Source Table Custom Parallelism
Flink 1.19 introduces the ability to set scan.parallelism to configure parallelism, currently supported by the DataGen connector. Adjusting source parallelism improves consumption speed and join efficiency for sources such as RocketMQ, MySQL, and Redis. Kafka is a special case; its parallelism should match the number of partitions, otherwise it defaults to the job’s maximum parallelism.
SQL Hint TTL Configuration
The new version simplifies state TTL configuration via SQL hints. Example statements show how to set TTL for joins and aggregations, making state size reduction and resource consumption more manageable in production environments.
-- set state ttl for join
SELECT /*+ STATE_TTL('Orders'='1d', 'Customers'='20d') */ *
FROM Orders LEFT OUTER JOIN Customers ON Orders.o_custkey = Customers.c_custkey;
-- set state ttl for aggregation
SELECT /*+ STATE_TTL('o'='1d') */ o_orderkey, SUM(o_totalprice) AS revenue
FROM Orders AS o
GROUP BY o_orderkey;Regular Join MiniBatch Optimization
Regular Join suffers from performance bottlenecks due to frequent state access. The MiniBatch optimization introduces batch deduplication to alleviate this issue, improving throughput for large or frequently accessed state.
Runtime Optimization
Flink 1.19 adds dynamic parallelism inference for source tables in batch jobs, allowing connectors to determine parallelism based on actual data volume. Currently, this feature is supported by the FileSource connector.
Checkpoint
The new release enables setting different checkpointing intervals for different data sources, such as reading a Hive table followed by consuming Kafka data, by configuring separate interval parameters.
execution.checkpointing.interval: 30sec
execution.checkpointing.interval-during-backlog: 30minThese are the key capabilities introduced in Flink 1.19 that users should monitor and experiment with in production environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
