Big Data 6 min read

Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink

Alibaba unveiled the open‑source Fluss project, a next‑generation real‑time stream storage built for Apache Flink that tackles traditional Kafka‑Flink limitations with millisecond‑level reads, columnar pruning, CDC support, and seamless Lakehouse integration, aiming to boost low‑latency analytics at scale.

Alibaba Cloud Developer

Nov 29, 2024

Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink

At the Flink Forward Asia 2024 keynote on November 29, Alibaba announced the open‑source release of the Fluss project (https://github.com/alibaba/fluss). Wang Feng, Vice Chair of Alibaba Open Source Committee, presented the project to an enthusiastic audience.

Fluss, developed by Alibaba Cloud Intelligent Flink team, is a next‑generation stream storage designed for analytics. It serves as a real‑time storage foundation for Apache Flink, enhancing Flink’s streaming computation capabilities. The name combines the initials of “FLink Unified Streaming Storage” and also references the German word “Fluss” meaning “river,” symbolizing continuous data flow.

In the Data + AI era, enterprises demand real‑time analytics, yet traditional architectures—typically Kafka paired with Flink—were not built with analytics as a primary focus. Kafka’s limitations include lack of data updates, inefficient queries, poor data reuse, difficulty replaying history, and high network costs, which constrain Flink’s broader applicability.

Fluss addresses these challenges by fusing columnar storage formats with real‑time update capabilities and tightly integrating with Flink, enabling high‑throughput, low‑latency, cost‑effective streaming data warehouses. Its core features include:

Real‑time read/write: Millisecond‑level streaming read and write.

Columnar pruning: Column‑store format with pruning that can boost read performance by up to 10× while reducing network usage.

Streaming updates: Supports large‑scale real‑time updates, including partial column updates for low‑cost wide‑table joins.

CDC subscription: Generates complete change logs; Flink can consume CDC streams for end‑to‑end real‑time data flow.

Real‑time point lookup: High‑performance primary‑key lookups suitable for dimension table joins in streaming pipelines.

Lake‑stream integration: Seamlessly integrates with Lakehouse, providing a real‑time data layer that enhances both lakehouse analytics and stream storage capabilities.

For more details, visit the project website: https://alibaba.github.io/fluss-docs. Alibaba, a major contributor to the Apache Flink community, has also donated Flink CDC to the Apache Foundation and graduated Apache Paimon as a top‑level project. Fluss is now open‑sourced under the Apache 2.0 license, with plans to donate it to the Apache Software Foundation in 2025. The community is invited to join the Fluss open‑source group (DingTalk ID: 109135004351) to help shape the next generation of stream storage technology.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data Flink open source stream storage

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.