Big Data 14 min read

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Apache Flink, the open‑source stream‑and‑batch processing engine, has surged to become one of the most active Apache projects, with rapid community growth in China, unified SQL capabilities, AI‑focused extensions, Kubernetes integration, and benchmark results that outperform Hive by up to seven times.

Alibaba Cloud Developer

Dec 16, 2019

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Apache Flink: A Rapidly Growing Open‑Source Engine

Apache Flink is recognized as the new‑generation open‑source big‑data computation engine that can run both batch and streaming jobs. It has become one of the most active projects in the Apache Foundation and on GitHub. In 2019, Alibaba senior real‑time computing expert Wang Feng summarized Flink’s development in China, Alibaba’s contributions, and future directions.

Community Growth and Adoption in China

Since its first contribution in 2014, Flink’s community has expanded quickly. It now ranks among the top three Apache projects by stars on GitHub, with the number of stars doubling in 2019 and a steady increase in contributors, especially from Chinese developers. Many domestic internet companies have adopted Flink as their primary real‑time computing solution, while global companies such as Uber, Netflix, Microsoft, and Amazon also use it.

Future Directions

Flink’s core use cases remain real‑time data analysis, real‑time risk control, and real‑time ETL. The community aims to evolve Flink into a unified data engine, achieving deeper batch‑stream integration and online function computation.

Further unify batch and stream processing to provide a single data‑analysis platform.

Leverage Flink’s event‑driven functions, state management, and other strengths for online analytics.

Unified SQL Architecture

Flink 1.9 introduced initial unified SQL features, and Flink 1.10 adds a batch‑stream unified query processor, full DDL support, and Python UDFs. Flink SQL now passes TPC‑H and TPC‑DS benchmarks, reaching production‑grade performance, and integrates tightly with Hive’s metastore, allowing seamless switching between Hive SQL and Flink SQL.

Benchmark Results

On a 10 TB TPC‑DS benchmark (Hive ORC format), Flink 1.10 executed all 99 queries and achieved up to 7× the performance of Hive 3.0, demonstrating both functional completeness and high efficiency.

AI Integration

Flink is expanding into AI scenarios. The ML Pipeline API introduces core concepts such as Transformer, Estimator, and Model, enabling developers to build machine‑learning workflows on Flink. PyFlink now supports Python Table API and Python UDFs, and the community collaborates with the Beam project to improve Python support.

Alibaba Alink and AI Flow

Alibaba’s internal Flink‑based machine‑learning library Alink, open‑sourced in 2019, provides a distributed batch‑stream unified ML platform that supports both offline and online training. AI Flow, a forthcoming data‑and‑AI workflow platform, will allow users to define data relationships and metadata, leveraging Flink’s batch‑stream capabilities.

Kubernetes (Cloud‑Native) Integration

Flink 1.10 will ship native integration with Kubernetes, offering multi‑tenant support, unified resource management, and improved operational reliability in production environments.

Blink Contribution and Ververica Platform

Alibaba’s Blink, open‑sourced in March 2019, contributed over one million lines of code back to Flink, enhancing runtime, SQL, PyFlink, and ML components. The joint effort with the Flink founders also produced the Ververica Platform, an enterprise‑grade Flink distribution used by Alibaba Cloud.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing SQL AI Kubernetes Apache Flink

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.