What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025
This article summarizes the major Apache Flink 2.0 updates released in the first half of 2025, covering architecture separation, cloud‑native deployment, AI‑driven agents, SQL enhancements, data integration, operational tools, and performance optimizations for real‑time intelligent computing.
Architecture Upgrade: Flink 2.0 Separation of Compute and Storage, Cloud‑Native Evolution
Flink 2.0 introduces a decoupled state‑management architecture to meet TB‑scale state storage and high‑throughput, low‑latency access requirements. The new asynchronous execution framework ForSt and layered storage system separate state from computation, leveraging cheap object storage for flexible resource scheduling, higher scalability, and lightweight fault tolerance.
State management upgrade
Flink 2.0 adds an asynchronous execution model and a separated state backend, redesigns SQL operators for parallel and asynchronous state access, introduces incremental checkpoints, and synchronizes state increments to distributed storage (e.g., S3, HDFS) so that checkpoint time stays under 10 seconds even with TB‑scale state.
Elastic scaling optimization
Supports in‑place rescaling combined with the Kubernetes Operator, reducing state migration time by more than 50 %.
Hardware adaptability improvement
Supports >25 Gbps network, achieving state access latency close to local SSD, driving Flink’s full transition to a cloud‑native architecture.
Flink Agents Sub‑project
Apache Flink community launches Flink Agents, an agent programming framework for event‑driven AI agents, encapsulating LLM, memory, tool, and prompt concepts and providing dynamic execution plans, looping, shared state, and observability.
This enables future integration of SQL and large‑model calls, allowing end‑to‑end real‑time data cleaning, analysis, and AI inference within Flink/Spark SQL.
Deep Optimizations for Unified SQL and Batch/Streaming
Flink SQL feature enhancements
Dynamic parallelism adjustment : the scan.parallelism parameter lets sources such as Kafka or DataGen define custom parallelism, combined with Adaptive Scheduler for on‑demand resource allocation, boosting throughput by ~30 %.
Fine‑grained state TTL control : STATE_TTL SQL hint allows separate TTL settings for Join and Aggregation operators.
-- set state ttl for join
SELECT /*+ STATE_TTL('Orders'='1d', 'Customers'='20d') */ *
FROM Orders LEFT OUTER JOIN Customers
ON Orders.o_custkey = Customers.c_custkey;
-- set state ttl for aggregation
SELECT /*+ STATE_TTL('o'='1d') */ o_orderkey, SUM(o_totalprice) AS revenue
FROM Orders AS o
GROUP BY o_orderkey;Async UDF support : new AsyncScalarFunction type processes remote API calls (e.g., AI model inference) without blocking, cutting latency by 50 %.
Materialized tables
Materialized tables are the cornerstone of unified stream‑batch processing, allowing declarative management of real‑time and historical data in a single pipeline.
Flink 2.0 also enhances lifecycle management, adding query changes, Kubernetes/YARN support, and integration with the Paimon lake storage format.
Adaptive query optimization
Improvements include adaptive broadcast join and automatic data‑skew join optimization, dynamically switching join strategies based on runtime statistics to improve efficiency and reduce tail latency.
Streaming lakehouse : deep integration with Paimon boosts real‑time processing capabilities for lakehouse workloads.
Data Integration
Batch mode support
Flink CDC 3.4 adds an execution.runtime-mode parameter; setting it to BATCH creates a Flink batch job, reducing resource consumption for full‑data sync scenarios.
Schema evolution optimization
Reduces coordination time during multi‑table sync initialization, fixes occasional job hangs on frequent schema changes, and enriches error logs with more table and data details for easier troubleshooting.
Lake format support
Flink CDC now supports Iceberg and Paimon connectors for seamless lake‑table writes.
Operations and Deployment Upgrades
Deployment modes
Flink 2.0’s deployment focuses on compute‑storage separation and cloud‑native optimization; Kubernetes support evolves from compatibility to deep optimization, becoming the preferred mode. The Application mode is reinforced as the default deployment choice.
Monitoring and observability
Metrics framework is rebuilt, offering unified collection, aggregation, and export interfaces; Prometheus now exposes operator‑level latency, back‑pressure, and state size, enabling real‑time health dashboards via Grafana.
Diagnostic tools : the new flink-diagnose CLI automatically detects common issues (back‑pressure, resource exhaustion, state bloat) and generates diagnostic reports.
AI Support
Flink CDC 3.3 adds dynamic AI model invocation in Transform expressions, natively supporting OpenAI models.
Flink SQL introduces dedicated syntax for AI models, allowing users to define models like catalogs and call them as functions or table functions within SQL statements.
Conclusion
Other optimizations include API refinements, configuration tweaks, and serialization improvements. Overall, the 2025 Flink updates center on cloud‑native architecture, deep AI integration, and lakehouse unification, driving Flink from a stream engine toward a real‑time intelligent computing platform.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
