Big Data 16 min read

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

DataFunTalk

Jan 11, 2022

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

Author: Cai Fangfang

Interviewee: Wang Feng (alias Mo Wen), Alibaba open‑source big‑data platform lead.

InfoQ's 2021 technology trend outlook highlighted the acceleration of integration in big data, and Apache Flink is taking a new step toward a “Streaming Warehouse”.

At Flink Forward Asia 2021, Wang Feng presented the concept of a Streaming Warehouse, arguing that Flink should evolve from pure stream processing to cover broader data‑warehouse scenarios.

He explained the “stream‑batch integration” (流批一体) philosophy, its recent technical advances in Flink 1.14 (mixed bounded/unbounded streams, unified Source/Sink API, combined DataStream and Table APIs), and two representative use cases: full‑incremental data integration with Flink CDC and a unified real‑time‑offline data‑warehouse architecture.

Flink CDC now supports many databases (MySQL, PostgreSQL, MongoDB, Oracle, etc.) and enables one‑SQL‑statement incremental sync.

The emerging “Streaming Warehouse” aims to provide end‑to‑end real‑time analytics with a single API, eliminating the split between streaming and batch pipelines and reducing system complexity.

To realize this, the community is developing a “Dynamic Table” storage layer that offers both a file store for batch reads and a log store for streaming updates, fully integrated with Flink SQL.

Wang forecasts that a mature Streaming Warehouse solution will appear within a year, with a preview expected in Flink 1.15, and emphasizes that Flink’s strength lies in its stateful streaming combined with a unified storage system.

Overall, the interview underscores the industry trend toward integrated, one‑stop data‑processing platforms and positions Flink’s upcoming features as a key driver of that evolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing Apache Flink Dynamic Table Flink CDC Streaming Warehouse

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.