Big Data 17 min read

How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing

This article explains how Alibaba’s Blink, built on Apache Flink, transforms batch‑oriented big‑data platforms into a unified, high‑performance real‑time computing engine, detailing its architecture, state management, checkpointing, and successful deployment in e‑commerce, search, recommendation, and online machine‑learning scenarios.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing

Real‑Time Computing Era

With the explosive growth of data types and volumes driven by widespread internet applications and smart hardware, industries seek deeper insights from big data, requiring powerful computation platforms; thus, the value of big data increasingly depends on real‑time processing capabilities.

Stream Computing Overview

Batch processing handles fixed data sets, while stream computing processes infinite data streams, making it the natural model for real‑time needs. The diagram below illustrates the distinction between batch (a finite segment of a stream) and continuous stream processing.

Robust stream processing must also manage state, such as aggregations or machine‑learning features, requiring storage, backup, recovery, versioning, and consistent read/write APIs, as shown in the following diagram.

Because event generation order may differ from arrival order, streams need timestamps (event time) and watermarks to handle out‑of‑order data and time‑window calculations, as depicted below.

Flink Introduction

Flink represents the most advanced pure stream processing framework, supporting both streaming and batch workloads and serving as the best runner for Apache Beam. Its architecture and ecosystem are illustrated below.

Flink’s state management and strong consistency rely on the Chandy‑Lamport algorithm, which inserts barriers to trigger checkpoints, enabling snapshot‑based recovery.

During checkpoint recovery, the saved snapshot is restored, as shown in the following diagram.

Blink Introduction

In 2015, Alibaba’s search data team faced the challenge of maintaining both nightly batch pipelines and daytime incremental pipelines for massive product catalogs. To unify these workloads, they evaluated Spark and Flink, ultimately choosing Flink for its streaming‑first design.

Because Flink’s maturity at the time was insufficient for Alibaba’s scale, the Blink project was launched to extend, optimize, and stabilize Flink for large‑scale real‑time scenarios, with all improvements contributed back to the open‑source community.

Blink Contributions to Flink

Architecture upgrade with native plugin support for various schedulers and Hadoop YARN integration.

Failover stability enhancements for Task, TaskManager, and JobManager components.

Incremental checkpoint design that dramatically speeds up checkpoint/recovery and reduces cost.

Async Operator implementation that boosts performance of I/O‑intensive nodes.

Comprehensive Table API redesign and unified batch‑stream SQL semantics.

Blink in Alibaba

Blink runs on Hadoop clusters, requesting resources from YARN and persisting state to HDFS for fault tolerance. It supports both DataStream/DataSet APIs for low‑level control and a high‑level Table API/SQL for rapid development.

Key production metrics include:

Over 3,000 machines in total.

Peak clusters exceeding 1,500 machines.

Billions of real‑time computations per second.

Largest jobs handling >5,000 concurrent tasks, 10 TB‑scale state, and hundred‑million TPS.

During Alibaba’s Double‑11 shopping festival, Blink powered fully real‑time search and recommendation pipelines, enabling instantaneous product updates and online machine‑learning models that significantly improved conversion rates.

Typical Use Cases

Real‑time A/B Test – User behavior logs are streamed, parsed, aggregated, and written to OLAP systems, allowing algorithms and operations teams to adjust models instantly.

Product Index Construction – Real‑time product updates are synchronized from MySQL to HBase, processed, and indexed for the search engine, using the same Blink logic for both streaming and batch workloads.

Porsche – Online Machine‑Learning Platform – Leveraging Blink’s real‑time capabilities, Porsche extracts features and trains models on massive user‑product interaction streams, instantly updating search and recommendation engines.

Blink Architecture

The green area represents the shared core framework with Flink, while the blue area contains Alibaba‑specific extensions such as resource management, state storage, monitoring, debugging tools, and custom I/O connectors, enabling seamless integration of open‑source and proprietary requirements.

Future of Blink

Alibaba plans to further invest in Blink, deepen collaboration with the open‑source community, expand its scale, and offer Blink as a unified real‑time computing service both internally and via Alibaba Cloud, bringing high‑performance streaming capabilities to more industries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaBig DataFlinkstream processingblinkReal‑Time Computingonline machine learning
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.