Big Data 10 min read

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

This article summarizes the major Apache Flink 2.0 updates released in the first half of 2025, covering architecture separation, cloud‑native deployment, AI‑driven agents, SQL enhancements, data integration, operational tools, and performance optimizations for real‑time intelligent computing.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

Architecture Upgrade: Flink 2.0 Separation of Compute and Storage, Cloud‑Native Evolution

Flink 2.0 introduces a decoupled state‑management architecture to meet TB‑scale state storage and high‑throughput, low‑latency access requirements. The new asynchronous execution framework ForSt and layered storage system separate state from computation, leveraging cheap object storage for flexible resource scheduling, higher scalability, and lightweight fault tolerance.

State management upgrade

Flink 2.0 adds an asynchronous execution model and a separated state backend, redesigns SQL operators for parallel and asynchronous state access, introduces incremental checkpoints, and synchronizes state increments to distributed storage (e.g., S3, HDFS) so that checkpoint time stays under 10 seconds even with TB‑scale state.

Elastic scaling optimization

Supports in‑place rescaling combined with the Kubernetes Operator, reducing state migration time by more than 50 %.

Hardware adaptability improvement

Supports >25 Gbps network, achieving state access latency close to local SSD, driving Flink’s full transition to a cloud‑native architecture.

Flink Agents Sub‑project

Apache Flink community launches Flink Agents, an agent programming framework for event‑driven AI agents, encapsulating LLM, memory, tool, and prompt concepts and providing dynamic execution plans, looping, shared state, and observability.

This enables future integration of SQL and large‑model calls, allowing end‑to‑end real‑time data cleaning, analysis, and AI inference within Flink/Spark SQL.

Deep Optimizations for Unified SQL and Batch/Streaming

Flink SQL feature enhancements

Dynamic parallelism adjustment : the scan.parallelism parameter lets sources such as Kafka or DataGen define custom parallelism, combined with Adaptive Scheduler for on‑demand resource allocation, boosting throughput by ~30 %.

Fine‑grained state TTL control : STATE_TTL SQL hint allows separate TTL settings for Join and Aggregation operators.

-- set state ttl for join
SELECT /*+ STATE_TTL('Orders'='1d', 'Customers'='20d') */ *
FROM Orders LEFT OUTER JOIN Customers
    ON Orders.o_custkey = Customers.c_custkey;

-- set state ttl for aggregation
SELECT /*+ STATE_TTL('o'='1d') */ o_orderkey, SUM(o_totalprice) AS revenue
FROM Orders AS o
GROUP BY o_orderkey;

Async UDF support : new AsyncScalarFunction type processes remote API calls (e.g., AI model inference) without blocking, cutting latency by 50 %.

Materialized tables

Materialized tables are the cornerstone of unified stream‑batch processing, allowing declarative management of real‑time and historical data in a single pipeline.

Flink 2.0 also enhances lifecycle management, adding query changes, Kubernetes/YARN support, and integration with the Paimon lake storage format.

Adaptive query optimization

Improvements include adaptive broadcast join and automatic data‑skew join optimization, dynamically switching join strategies based on runtime statistics to improve efficiency and reduce tail latency.

Streaming lakehouse : deep integration with Paimon boosts real‑time processing capabilities for lakehouse workloads.

Data Integration

Batch mode support

Flink CDC 3.4 adds an execution.runtime-mode parameter; setting it to BATCH creates a Flink batch job, reducing resource consumption for full‑data sync scenarios.

Schema evolution optimization

Reduces coordination time during multi‑table sync initialization, fixes occasional job hangs on frequent schema changes, and enriches error logs with more table and data details for easier troubleshooting.

Lake format support

Flink CDC now supports Iceberg and Paimon connectors for seamless lake‑table writes.

Operations and Deployment Upgrades

Deployment modes

Flink 2.0’s deployment focuses on compute‑storage separation and cloud‑native optimization; Kubernetes support evolves from compatibility to deep optimization, becoming the preferred mode. The Application mode is reinforced as the default deployment choice.

Monitoring and observability

Metrics framework is rebuilt, offering unified collection, aggregation, and export interfaces; Prometheus now exposes operator‑level latency, back‑pressure, and state size, enabling real‑time health dashboards via Grafana.

Diagnostic tools : the new flink-diagnose CLI automatically detects common issues (back‑pressure, resource exhaustion, state bloat) and generates diagnostic reports.

AI Support

Flink CDC 3.3 adds dynamic AI model invocation in Transform expressions, natively supporting OpenAI models.

Flink SQL introduces dedicated syntax for AI models, allowing users to define models like catalogs and call them as functions or table functions within SQL statements.

Conclusion

Other optimizations include API refinements, configuration tweaks, and serialization improvements. Overall, the 2025 Flink updates center on cloud‑native architecture, deep AI integration, and lakehouse unification, driving Flink from a stream engine toward a real‑time intelligent computing platform.

图片
图片
图片
图片
cloud nativeBig DataFlinkstream processingSQLAI integration
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.