What’s New in Apache Flink 1.9.0? Deep Dive into Architecture, Table API & Hive Integration
Apache Flink 1.9.0, released on August 22, merges Alibaba's Blink engine, introduces a major architecture overhaul, enriches Table API & SQL, adds batch and stream processing enhancements, and integrates tightly with Hive, marking a significant milestone for large‑scale data processing.
On August 22, Apache Flink 1.9.0 was officially released. Earlier this year Alibaba open‑sourced the Blink engine, contributing it to Flink and resulting in more than 1.5 million lines of code changes.
Key Statistics
Issue count and commit volume in 1.9.0 exceed the combined totals of the two previous releases.
Approximately 1.5 million lines of code were modified, making it the most active Flink version to date.
The contributor base continues to grow, with increasing participation from Chinese developers.
Architecture Upgrade
The release brings a fundamental architecture shift, especially in the convergence of stream and batch processing.
Previously Flink maintained two largely independent APIs: DataStream for streaming and DataSet for batch. Both APIs had separate translation, optimization, and execution paths, leading to duplicated effort and limited code reuse.
In the new architecture, the DataSet API will be retired. Users will primarily work with:
DataStream API – a "what‑you‑see‑is‑what‑you‑get" model where users directly describe operator relationships.
Table API & SQL – a relational‑style API that the engine optimizes into either DataStream or DataSet execution plans.
Both APIs will share a common technical stack, including a unified DAG representation and a shared StreamOperator implementation.
Table API & SQL
The Blink‑derived Table module was the first to adopt the new architecture in Flink 1.9. To maintain compatibility with existing users, the community split the Table module (FLIP‑32) and introduced a Planner interface that supports multiple planner implementations.
Key enhancements include:
FLIP‑37: Refactored Table API type system.
FLIP‑29: Added multi‑row, multi‑column operations.
FLINK‑10232: Initial SQL DDL support.
FLIP‑30: Unified Catalog API.
FLIP‑38: Python bindings for Table API.
Batch Processing Improvements
FLIP‑1 (Fine‑Grained Recovery) now enables Flink to compute the minimal fail‑over region for a failed batch task, avoiding unnecessary re‑execution of unaffected operators.
FLIP‑31 introduces a pluggable shuffle service, allowing users to choose between network‑based or file‑based shuffle implementations, which can be backed by YARN auxiliary services or custom distributed services.
Stream Processing Improvements
FLIP‑43 adds a State Processor API, offering flexible access to Flink state and savepoints. It enables use cases such as:
Loading external data into a savepoint before job start, reducing cold‑start latency.
Analyzing state data with batch APIs.
Correcting dirty data in state.
Migrating state when job logic changes.
FLIP‑34 (Stop with Savepoint) ensures that when a job is paused, a consistent global snapshot is taken, preventing duplicate output upon restart.
Hive Integration
Flink 1.9 fully supports Hive MetaStore access via the unified Catalog API (FLIP‑30) and provides a Hive connector for CSV, SequenceFile, ORC, and Parquet formats. It also supports Hive UDF, UDTF, and UDAF in Flink SQL.
Conclusion
Flink 1.9.0 represents a major step forward after six months of intensive development, with substantial code contributions and growing community involvement, especially from Chinese developers. The roadmap continues to focus on expanding functionality and ecosystem integration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
