Big Data 6 min read

How Fluss Redefines Real‑Time Stream Storage for Flink

Fluss, an open‑source real‑time stream storage project from Alibaba, integrates columnar formats and low‑latency updates with Apache Flink to address the limitations of traditional Kafka‑Flink pipelines, offering high throughput, low cost, and seamless lakehouse support for modern data analytics.

Alibaba Cloud Big Data AI Platform

Nov 29, 2024

How Fluss Redefines Real‑Time Stream Storage for Flink

At the Flink Forward Asia 2024 keynote on November 29, Alibaba announced the open‑source release of the Fluss project (https://github.com/alibaba/fluss). The announcement was made by Wang Feng, Vice Chair of Alibaba Open Source Committee, and received enthusiastic audience response.

Fluss is a next‑generation stream storage system developed by Alibaba Cloud Intelligent Flink team, designed to solve long‑standing challenges of stream storage for analytics. It serves as a real‑time storage layer for Apache Flink, enhancing Flink’s streaming capabilities. The name combines “FLink Unified Streaming Storage” and the German word “Fluss” meaning “river”, symbolizing continuous data flow.

In the era of Data + AI, enterprises increasingly demand real‑time data analysis, but traditional architectures often overlook analytical scenarios, leading to issues such as lack of data updates, inefficient queries, difficulty reusing data, hard historical back‑tracking, and high network costs when combining Kafka with Flink.

Fluss addresses these problems by integrating columnar storage and real‑time update capabilities, tightly coupling with Flink to enable users to build high‑throughput, low‑latency, low‑cost streaming data warehouses. Its core features include:

Real‑time read/write: Millisecond‑level streaming read/write.

Columnar pruning: Columnar storage with pruning boosts read performance by up to 10× and reduces network overhead.

Streaming updates: Supports massive real‑time updates, including partial column updates for low‑cost wide‑table joins.

CDC subscription: Generates complete change logs; Flink can consume CDC streams for end‑to‑end real‑time data flow.

Real‑time point lookup: High‑performance primary‑key point queries for dimension table joins.

Lake‑house integration: Seamlessly integrates with lakehouse architectures, providing a real‑time data layer for low‑latency analytics.

Alibaba, a major contributor to the Apache Flink community, has also donated the Flink CDC project to the Apache Software Foundation and graduated the lake storage project Apache Paimon. Fluss is now open‑source under the Apache 2.0 license on GitHub, with plans to donate it to the Apache Software Foundation in 2025. The community is invited to join the Fluss open‑source project, contribute, and help build the next generation of stream storage technology.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

stream-processing Apache Flink Fluss real-time storage

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.