Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019
Apache Flink, the open‑source stream‑and‑batch processing engine, has surged to become one of the most active Apache projects, with rapid community growth in China, unified SQL capabilities, AI‑focused extensions, Kubernetes integration, and benchmark results that outperform Hive by up to seven times.
Apache Flink: A Rapidly Growing Open‑Source Engine
Apache Flink is recognized as the new‑generation open‑source big‑data computation engine that can run both batch and streaming jobs. It has become one of the most active projects in the Apache Foundation and on GitHub. In 2019, Alibaba senior real‑time computing expert Wang Feng summarized Flink’s development in China, Alibaba’s contributions, and future directions.
Community Growth and Adoption in China
Since its first contribution in 2014, Flink’s community has expanded quickly. It now ranks among the top three Apache projects by stars on GitHub, with the number of stars doubling in 2019 and a steady increase in contributors, especially from Chinese developers. Many domestic internet companies have adopted Flink as their primary real‑time computing solution, while global companies such as Uber, Netflix, Microsoft, and Amazon also use it.
Future Directions
Flink’s core use cases remain real‑time data analysis, real‑time risk control, and real‑time ETL. The community aims to evolve Flink into a unified data engine, achieving deeper batch‑stream integration and online function computation.
Further unify batch and stream processing to provide a single data‑analysis platform.
Leverage Flink’s event‑driven functions, state management, and other strengths for online analytics.
Unified SQL Architecture
Flink 1.9 introduced initial unified SQL features, and Flink 1.10 adds a batch‑stream unified query processor, full DDL support, and Python UDFs. Flink SQL now passes TPC‑H and TPC‑DS benchmarks, reaching production‑grade performance, and integrates tightly with Hive’s metastore, allowing seamless switching between Hive SQL and Flink SQL.
Benchmark Results
On a 10 TB TPC‑DS benchmark (Hive ORC format), Flink 1.10 executed all 99 queries and achieved up to 7× the performance of Hive 3.0, demonstrating both functional completeness and high efficiency.
AI Integration
Flink is expanding into AI scenarios. The ML Pipeline API introduces core concepts such as Transformer, Estimator, and Model, enabling developers to build machine‑learning workflows on Flink. PyFlink now supports Python Table API and Python UDFs, and the community collaborates with the Beam project to improve Python support.
Alibaba Alink and AI Flow
Alibaba’s internal Flink‑based machine‑learning library Alink, open‑sourced in 2019, provides a distributed batch‑stream unified ML platform that supports both offline and online training. AI Flow, a forthcoming data‑and‑AI workflow platform, will allow users to define data relationships and metadata, leveraging Flink’s batch‑stream capabilities.
Kubernetes (Cloud‑Native) Integration
Flink 1.10 will ship native integration with Kubernetes, offering multi‑tenant support, unified resource management, and improved operational reliability in production environments.
Blink Contribution and Ververica Platform
Alibaba’s Blink, open‑sourced in March 2019, contributed over one million lines of code back to Flink, enhancing runtime, SQL, PyFlink, and ML components. The joint effort with the Flink founders also produced the Ververica Platform, an enterprise‑grade Flink distribution used by Alibaba Cloud.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
