Big Data 7 min read

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

The article reviews Apache Flink 1.17's major batch and streaming improvements, including new Delete/Update APIs, performance boosts, SQL client gateway, checkpoint and watermark enhancements, StateBackend upgrades, and practical use‑case scenarios for data engineers.

Big Data Technology & Architecture

Mar 27, 2023

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

Batch Section

Flink 1.17 introduces three important FLIPs for batch processing.

Streaming Warehouse API (FLIP‑282) adds Delete and Update operations that work in batch mode, enabling row‑level modifications in external stores such as Flink Table Store and enhancing ALTER TABLE capabilities for columns, primary keys, and watermarks.

Batch performance optimizations deliver a 26% TPC‑DS speedup through join‑reorder algorithms, adaptive local hash aggregation, Hive aggregation improvements, and a hybrid shuffle mode; stability is improved with predictive execution for all operators and adaptive batch scheduling, while usability benefits from default‑enabled adaptive scheduling and simplified configuration.

SQL Client/Gateway now supports a gateway mode, allowing SQL jobs to be submitted to a remote SQL Gateway and enabling job management (querying and stopping jobs) directly from the client.

These changes make Flink Batch a mature, stable solution, with many large companies replacing tools like DataX for offline tasks. Two typical scenarios are highlighted: (1) loading historical data into dimension tables (e.g., Hive→HBase or Hive→Redis) with daily updates, and (2) handling complex dimension‑table logic directly in Flink Batch SQL.

Streaming Section

Key streaming enhancements in Flink 1.17 include:

Streaming SQL semantics are strengthened; non‑deterministic operations are addressed, and an experimental PLAN_ADVICE feature provides correctness risk warnings and optimization suggestions.

Checkpoint improvements feature General Incremental Checkpoint (GIC) for faster, more stable checkpoints, enhanced Unaligned Checkpoint stability under backpressure, and a new REST API for custom checkpoint types.

Watermark alignment (FLIP‑217) aligns split emissions inside source operators, reducing downstream buffering and improving overall stream efficiency.

StateBackend upgrade moves RocksDB to version 6.20.3‑ververica‑2.0, adding slot‑shared memory, Apple Silicon support, and expanded configuration for better memory utilization across TaskManager slots.

The article notes that while streaming capabilities are already strong, the upcoming integration of batch‑stream convergence will continue to challenge developers, urging them to master these mature features. It also mentions the independent launch of Flink Table Store under the Apache Paimon incubator, advising readers not to chase trends prematurely.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL Batch Processing Apache Flink Streaming StateBackend Checkpoint

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.