Big Data 20 min read

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

The 2020 Flink Forward Asia conference showcased Apache Flink's rapid growth, community milestones, industry adoption, and technical breakthroughs such as unaligned checkpoints, approximate failover, the Nexmark benchmark, stream‑batch unification, AI integration via PyFlink and Alink, and deep cloud‑native support on Kubernetes, illustrated through case studies from Alibaba, Meituan, Kuaishou, and Dell.

DataFunTalk

Jan 5, 2021

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

Flink Forward Asia 2020, held from December 13‑15, was the largest Apache‑level conference in China, attracting over 92,000 unique viewers and more than 40 participating companies, and marking a new era for Apache Flink.

Community Growth – According to the Apache Foundation fiscal report, Flink remained the most active Apache project in 2020, with star counts and contributor numbers growing over 30% annually. The Chinese mailing list ([email protected]) surpassed the English list in activity, the official WeChat account exceeded 30,000 followers, and the Chinese learning portal (https://flink-learning.org.cn/) was launched.

Industry Impact – Flink has become the de‑facto standard for real‑time computation worldwide, with more than 40 leading tech firms presenting use cases ranging from online education to finance, video, e‑commerce, and recommendation systems.

Engine Innovations – In 2020 Flink delivered major advances in four pillars: stream‑processing kernel, stream‑batch unification, AI integration, and cloud‑native deployment. Highlights include:

Unaligned Checkpoint: enables checkpoints under back‑pressure by also snapshotting channel state and output buffers.

Approximate Failover: provides a more flexible fault‑tolerance mode where only the failed node restarts, preserving pipeline continuity for AI training and recommendation workloads.

Nexmark Benchmark: the first Flink streaming benchmark with 16 ANSI‑SQL queries, available at https://github.com/nexmark/nexmark.

Stream‑Batch Unification – Flink‑1.10 and 1.11 introduced unified SQL & Table APIs for batch and streaming, while Flink‑1.12 extended this to the DataStream API. Performance improvements are evident: TPC‑DS benchmarks on a 10 TB dataset run in under 10 000 seconds on a 20‑node 64‑core cluster, making Flink Batch competitive with leading batch engines.

The unified architecture also simplifies data‑lake and data‑warehouse integration, allowing seamless full‑load to incremental CDC pipelines and supporting Iceberg and Hudi connectors.

AI Fusion – PyFlink now offers full Python support for DataStream, Table, and SQL APIs, including Python UDF/UDTF and Pandas integration. Alibaba’s Alink library provides a rich set of machine‑learning algorithms built on Flink’s stream‑batch model, and the open‑source Flink AI Extended project (https://github.com/alibaba/flink-ai-extended) integrates TensorFlow/PyTorch and offers a unified workflow engine.

Cloud‑Native Deployment – Since Flink‑1.10, native Kubernetes support has matured; Flink can run without ZooKeeper, supports HA, dynamic scaling, GPU scheduling, and direct JobManager‑K8s communication.

Case Studies – Alibaba demonstrated Flink’s stream‑batch unification in the core Double‑11 marketing analytics screen, processing 40 billion records per second and doubling business capacity without additional resources. Meituan highlighted “incremental production” and unified real‑time/offline data pipelines using Flink for Kafka‑to‑Hive, real‑time analytics, and data‑link synchronization. Kuaishou shared its Flink‑based real‑time ETL, reporting up to 600 million records per second and introducing the high‑performance SlimBase state store. Dell presented Pravega, a CNCF‑graduated stream‑storage system with a native Flink connector.

The conference concluded with a forward‑looking summary: Flink’s four focus areas—kernel, stream‑batch, AI, and cloud‑native—have delivered strong results in 2020 and will continue to drive innovation in 2021 and beyond.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native stream processing Apache Flink AI integration stream-batch

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.