Big Data 23 min read

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

ITPUB

Mar 24, 2023

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17.0 Release

The Flink Project Management Committee released Flink 1.17.0, contributed by 172 developers, 7 FLIPs and over 600 issues. The release focuses on a streaming‑warehouse model, adding batch‑mode row‑level updates, performance‑oriented optimizer changes, and new runtime features.

Batch Processing Enhancements

Streaming Warehouse API (FLIP‑282) : Introduces DELETE and UPDATE statements for batch tables, enabling row‑level modifications in external stores (e.g., Flink Table Store). The ALTER TABLE syntax is extended to support ADD/MODIFY/DROP columns, primary keys and watermarks.

Performance Optimizations : A new join‑reorder algorithm, adaptive local hash aggregation, Hive aggregation improvements and hybrid shuffle mode together deliver up to 26 % TPC‑DS speedup on a 10 TB dataset compared with Flink 1.16. Adaptive batch scheduling is now enabled by default, automatically deriving parallelism per job vertex based on data volume.

Hybrid Shuffle Mode : Reuses intermediate data, works with the adaptive batch scheduler and predictive execution, and improves stability for large‑scale production workloads.

Streaming Processing Enhancements

Streaming SQL Semantic Fixes : Resolves nondeterministic operation issues and adds the experimental PLAN_ADVICE feature, which warns about correctness risks and suggests optimizer improvements. Example output is shown below.

Checkpoint Improvements : Generic Incremental Checkpoint (GIC) reduces checkpoint duration by ~79.5 % and incremental size by ~95 % (see example REST API for manual triggering). Unaligned Checkpoint (UC) is production‑ready, lowering checkpoint latency under back‑pressure.

Watermark Alignment (FLIP‑217) : Aligns watermark emission across source splits, reducing downstream buffering and improving overall stream efficiency.

State Backend Upgrade : FRocksDB upgraded to version 6.20.3‑ververica‑2.0, adding Apple Silicon support, shared memory between TaskManager slots, a new periodic_compaction_seconds option, and performance gains by avoiding expensive toString() calls in compaction filters.

Predictive Execution for Sinks

Sink operators now support predictive execution. Built‑in sinks (DiscardingSink, PrintSink, FileSink, HiveTableSink, etc.) can obtain the attempt number of the current sub‑task and isolate output data from concurrent attempts. The slow‑task detector also considers input data volume, mitigating data‑skew effects.

SQL Client / Gateway

A new gateway mode allows users to submit SQL statements to a remote SQL Gateway and manage job lifecycles (list, stop) via SQL, providing functionality comparable to the Flink CLI.

Hive Connector Improvements

Automatic file merging is now available in batch mode, reducing the number of small files.

Native Hive aggregation functions (SUM, COUNT, AVG, MIN, MAX) are executed on hash‑based aggregation operators for better performance.

Streaming FileSink Extension

The FileSink now supports five filesystems: HDFS, S3, OSS, ABFS and local, broadening storage options for streaming jobs.

Calcite Upgrade

Calcite upgraded to 1.29.0 , fixing bugs (CALCITE‑4325, CALCITE‑4352) and improving SQL optimizer performance.

Other Notable Changes

PyFlink now runs on Python 3.10 and Apple Silicon, with improved cross‑process communication and UDF type handling.

Task‑level flame graphs provide detailed performance visualisation per sub‑task.

Generalized delegation token framework (FLIP‑272) and Kerberos token improvements (FLIP‑211) extend authentication support beyond Hadoop.

Upgrade Guidance

When migrating to Flink 1.17, adjust configuration parameters such as state.backend.rocksdb.memory.fixed-per-tm to control shared memory allocation. Refer to the official release notes for a complete list of required changes.

Example: PLAN_ADVICE Output

== Optimized Physical Plan With Advice ==
...advice[1]: [WARNING] The column(s): day(generated by non-deterministic function: CURRENT_TIMESTAMP) cannot satisfy the determinism requirement for correctly processing update message('UB'/'UA'/'D' in changelogMode, not 'I' only)...

References

FLIP‑282: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061

FLIP‑217: https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits

FRocksDB repository: https://github.com/ververica/frocksdb

Sink interface (new API): https://github.com/apache/flink/blob/release-1.17/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java

OutputFormat sink example: https://github.com/apache/flink/blob/release-1.17/flink-core/src/main/java/org/apache/flink/api/common/io/OutputFormat.java

Hybrid Shuffle documentation: https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/ops/batch/batch_shuffle/#hybrid-shuffle

HiveModule functions: https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/connectors/table/hive/hive_functions/

PLAN_ADVICE documentation: https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/dev/table/sql/explain/#explaindetails

CALCITE‑4325 issue: https://issues.apache.org/jira/browse/CALCITE-4325

CALCITE‑4352 issue: https://issues.apache.org/jira/browse/CALCITE-4352

FLINK‑29849, FLINK‑30006, FLINK‑30841 (checkpoint optimizer fixes): https://issues.apache.org/jira/browse/FLINK-29849, https://issues.apache.org/jira/browse/FLINK-30006, https://issues.apache.org/jira/browse/FLINK-30841

FLINK‑30836 (RocksDBStateBackend memory config): https://issues.apache.org/jira/browse/FLINK-30836

Release notes: https://nightlies.apache.org/flink/flink-docs-release-1.17/release-notes/flink-1.17/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

stream processing SQL Batch Processing Apache Flink Data Warehouse RocksDB Checkpoint

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Apache Flink 1.17.0 Release

Batch Processing Enhancements

Streaming Processing Enhancements

Predictive Execution for Sinks

SQL Client / Gateway

Hive Connector Improvements

Streaming FileSink Extension

Calcite Upgrade

Other Notable Changes

Upgrade Guidance

Example: PLAN_ADVICE Output

References

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

Apache Flink 1.17.0 Release