Big Data 14 min read

What’s New in Apache Flink 1.9.0? Deep Dive into Architecture, Table API & Hive Integration

Apache Flink 1.9.0, released on August 22, merges Alibaba's Blink engine, introduces a major architecture overhaul, enriches Table API & SQL, adds batch and stream processing enhancements, and integrates tightly with Hive, marking a significant milestone for large‑scale data processing.

Alibaba Cloud Developer

Aug 23, 2019

What’s New in Apache Flink 1.9.0? Deep Dive into Architecture, Table API & Hive Integration

On August 22, Apache Flink 1.9.0 was officially released. Earlier this year Alibaba open‑sourced the Blink engine, contributing it to Flink and resulting in more than 1.5 million lines of code changes.

Key Statistics

Issue count and commit volume in 1.9.0 exceed the combined totals of the two previous releases.

Approximately 1.5 million lines of code were modified, making it the most active Flink version to date.

The contributor base continues to grow, with increasing participation from Chinese developers.

Architecture Upgrade

The release brings a fundamental architecture shift, especially in the convergence of stream and batch processing.

Previously Flink maintained two largely independent APIs: DataStream for streaming and DataSet for batch. Both APIs had separate translation, optimization, and execution paths, leading to duplicated effort and limited code reuse.

In the new architecture, the DataSet API will be retired. Users will primarily work with:

DataStream API – a "what‑you‑see‑is‑what‑you‑get" model where users directly describe operator relationships.

Table API & SQL – a relational‑style API that the engine optimizes into either DataStream or DataSet execution plans.

Both APIs will share a common technical stack, including a unified DAG representation and a shared StreamOperator implementation.

Table API & SQL

The Blink‑derived Table module was the first to adopt the new architecture in Flink 1.9. To maintain compatibility with existing users, the community split the Table module (FLIP‑32) and introduced a Planner interface that supports multiple planner implementations.

Key enhancements include:

FLIP‑37: Refactored Table API type system.

FLIP‑29: Added multi‑row, multi‑column operations.

FLINK‑10232: Initial SQL DDL support.

FLIP‑30: Unified Catalog API.

FLIP‑38: Python bindings for Table API.

Batch Processing Improvements

FLIP‑1 (Fine‑Grained Recovery) now enables Flink to compute the minimal fail‑over region for a failed batch task, avoiding unnecessary re‑execution of unaffected operators.

FLIP‑31 introduces a pluggable shuffle service, allowing users to choose between network‑based or file‑based shuffle implementations, which can be backed by YARN auxiliary services or custom distributed services.

Stream Processing Improvements

FLIP‑43 adds a State Processor API, offering flexible access to Flink state and savepoints. It enables use cases such as:

Loading external data into a savepoint before job start, reducing cold‑start latency.

Analyzing state data with batch APIs.

Correcting dirty data in state.

Migrating state when job logic changes.

FLIP‑34 (Stop with Savepoint) ensures that when a job is paused, a consistent global snapshot is taken, preventing duplicate output upon restart.

Hive Integration

Flink 1.9 fully supports Hive MetaStore access via the unified Catalog API (FLIP‑30) and provides a Hive connector for CSV, SequenceFile, ORC, and Parquet formats. It also supports Hive UDF, UDTF, and UDAF in Flink SQL.

Conclusion

Flink 1.9.0 represents a major step forward after six months of intensive development, with substantial code contributions and growing community involvement, especially from Chinese developers. The roadmap continues to focus on expanding functionality and ecosystem integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

batch processing Apache Flink Table API Hive Integration

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.