Big Data 6 min read

Flink’s AI Agents and Disaggregated State: Transforming Big Data

The article reviews key topics from the FFA2025 Singapore conference, highlighting Flink’s new AI‑focused Agents framework, the breakthrough Flink 2.0 disaggregated state architecture, emerging lake storage solutions like Paimon, and the Fluss streaming table store, illustrating how big‑data platforms are evolving for AI workloads.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Flink’s AI Agents and Disaggregated State: Transforming Big Data

Subtopics

Flink ecosystem fully embraces AI: Flink Agents

Apache Flink community launched a new subproject Flink Agents, an agent programming framework for event‑driven AI agents. It wraps essential LLM, Memory, Tool, Prompt concepts and provides dynamic execution plans, looping, shared state, and observability.

Previously, large‑model applications were mainly server‑side calls; services like MCP were mature. Server‑side development led large‑model adoption, while data‑side development faced inefficiencies such as RPC‑via‑UDF.

Flink’s forward‑looking approach could enable SQL‑based interaction with large models, allowing end‑to‑end real‑time AI inference via Flink/Spark SQL.

Related reads: Data Agent: Data + AI typical scenario; Core technologies for big data + large models.

Flink 2.0 under disaggregated architecture

Flink 2.0 solves long‑standing snapshot cost, slow state recovery, and tightly coupled state‑compute issues with a disaggregated state management architecture that separates state storage from compute, leveraging cheap object storage for flexible scheduling, scalability, and lightweight fault tolerance.

Earlier Flink 1.x used integrated storage‑compute; “store” referred to state storage, “compute” to subtasks executing business logic. While this design minimized latency, state drawbacks included lack of observability, high resource consumption, and slow recovery.

The community introduced ForSt DB, a remote state store offering optimized access, fast checkpoints, and recovery, similar to Spark’s remote shuffle service.

Multimodal unified lake storage Paimon

Data lake projects such as Hudi, Iceberg, and Paimon have mature deployments; see linked articles for summaries of Flink+Paimon/Hudi+Doris architectures and production experiences.

Fluss: streaming table store for real‑time analytics and AI workloads

See the referenced article for details on the problems Fluss aims to solve.

Conclusion

Since 2025, major data‑development frameworks have reached milestone releases, marking a new era. The rise of large models has injected fresh dynamics into a previously stagnant field, presenting both challenges and opportunities for practitioners.

Glad to share this moment with the community.

图片
图片
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkAI AgentsStreamingdata lakeDisaggregated State
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.