Flink’s AI Agents and Disaggregated State: Transforming Big Data
The article reviews key topics from the FFA2025 Singapore conference, highlighting Flink’s new AI‑focused Agents framework, the breakthrough Flink 2.0 disaggregated state architecture, emerging lake storage solutions like Paimon, and the Fluss streaming table store, illustrating how big‑data platforms are evolving for AI workloads.
Subtopics
Flink ecosystem fully embraces AI: Flink Agents
Apache Flink community launched a new subproject Flink Agents, an agent programming framework for event‑driven AI agents. It wraps essential LLM, Memory, Tool, Prompt concepts and provides dynamic execution plans, looping, shared state, and observability.
Previously, large‑model applications were mainly server‑side calls; services like MCP were mature. Server‑side development led large‑model adoption, while data‑side development faced inefficiencies such as RPC‑via‑UDF.
Flink’s forward‑looking approach could enable SQL‑based interaction with large models, allowing end‑to‑end real‑time AI inference via Flink/Spark SQL.
Related reads: Data Agent: Data + AI typical scenario; Core technologies for big data + large models.
Flink 2.0 under disaggregated architecture
Flink 2.0 solves long‑standing snapshot cost, slow state recovery, and tightly coupled state‑compute issues with a disaggregated state management architecture that separates state storage from compute, leveraging cheap object storage for flexible scheduling, scalability, and lightweight fault tolerance.
Earlier Flink 1.x used integrated storage‑compute; “store” referred to state storage, “compute” to subtasks executing business logic. While this design minimized latency, state drawbacks included lack of observability, high resource consumption, and slow recovery.
The community introduced ForSt DB, a remote state store offering optimized access, fast checkpoints, and recovery, similar to Spark’s remote shuffle service.
Multimodal unified lake storage Paimon
Data lake projects such as Hudi, Iceberg, and Paimon have mature deployments; see linked articles for summaries of Flink+Paimon/Hudi+Doris architectures and production experiences.
Fluss: streaming table store for real‑time analytics and AI workloads
See the referenced article for details on the problems Fluss aims to solve.
Conclusion
Since 2025, major data‑development frameworks have reached milestone releases, marking a new era. The rise of large models has injected fresh dynamics into a previously stagnant field, presenting both challenges and opportunities for practitioners.
Glad to share this moment with the community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
