Big Data 25 min read

What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit

The 2022 Flink Forward Asia summit showcased Apache Flink’s rapid community growth, key technical breakthroughs such as distributed snapshot upgrades, cloud‑native state storage, hybrid shuffle, Flink CDC 2.0, and Flink ML 2.0, and real‑world deployments at companies like Midea, miHoYo and Disney.

Alibaba Cloud Big Data AI Platform

Nov 30, 2022

What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit

Flink Forward Asia 2022 Overview

Held online on November 26‑27, the Flink Forward Asia (FFA) summit, organized by the Apache Software Foundation and hosted by Alibaba Cloud, gathered thousands of participants and featured the award ceremony of the fourth Tianchi Real‑Time Computing Flink Challenge.

Community Growth

In 2022 Apache Flink’s GitHub stars surpassed 20 000, contributors exceeded 1 600, and monthly downloads topped 14 million. Chinese developers contributed 45% of all PRs, the official WeChat account published over 130 technical articles, and the new video channel attracted nearly 4 000 followers.

Keynote Themes

Cloud and Open‑Source Collaboration

Alibaba’s Vice President Jia Yangqing emphasized that cloud provides the optimal environment for deploying and accessing open‑source software, fostering a symbiotic relationship that drives the evolution of cloud‑native technologies.

Distributed Consistent Snapshot Upgrade

Flink introduced Unaligned Checkpoint, Buffer Debloating, and Log‑based Checkpoint to reduce snapshot latency and storage costs, forming a new generation of distributed snapshot architecture in Flink 1.16.

Cloud‑Native State Storage

To meet elastic scaling demands, Flink’s state backend was optimized, achieving 2‑10× performance gains, and a tiered state storage architecture is planned to fully separate compute and storage.

Hybrid Shuffle for Stream‑Batch Fusion

Flink 1.16 launched Hybrid Shuffle, combining the low‑latency pipelined shuffle of streaming with the robustness of batch blocking shuffle, improving resource utilization and performance.

Flink CDC 2.0

Flink CDC 2.0 adds a generic incremental snapshot framework, high‑performance parallel reads, checkpoint‑based fault‑tolerance, and lock‑free source connectors, supporting a wide range of databases and already earning over 3 000 GitHub stars.

Flink ML 2.0

Re‑built on the DataStream API, Flink ML 2.0 offers online training, checkpoint‑based recovery, and a growing library of algorithms for feature engineering and low‑latency inference.

Streaming Data Warehouse Vision

The community proposes a unified streaming‑batch storage layer, embodied by the Flink Table Store project, which combines LSM‑based LakeStore and LogStore to deliver high‑performance, cloud‑native, real‑time data warehousing.

Industry Deployments

Midea

Midea uses Flink for B‑end long‑cycle analytics, factory production monitoring, and real‑time promotional dashboards, integrating batch Hive data with Kafka streams for comprehensive insights.

miHoYo

miHoYo processes billions of game logs daily with Flink, powering real‑time dashboards, near‑real‑time data lakes via Iceberg, and real‑time risk control, while continuously expanding platform capabilities such as auto‑scaling and resource elasticity.

Disney

Disney’s streaming‑media advertising platform relies on Flink for ad decision funnels, exposure monitoring, and operational dashboards, running on Kubernetes with Flink Operator and employing techniques like gang scheduling and mixed‑mode job placement.

Conclusion

The summit demonstrated Apache Flink’s vibrant ecosystem, continuous innovation across state management, fault tolerance, shuffle, data integration, and machine learning, and its expanding adoption across diverse industries, signaling a strong future for real‑time big‑data processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Real-time Streaming Apache Flink Flink Forward Asia Streaming Data Warehouse

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.