How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification

This article examines how Apache Flink embraces AI by leveraging the Lambda architecture and stream‑batch unification to enable real‑time data processing across preprocessing, model training, and inference, discusses the challenges of model updates and code maintenance, and outlines ongoing Flink initiatives that support AI real‑timeization.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification

Lambda Architecture, Stream‑Batch Unification and Real‑Time AI

At Flink Forward Asia 2019 the community announced a future direction of embracing AI, recognizing the crowded AI landscape and questioning how Flink will integrate AI and what value it will bring.

Flink's AI value is closely tied to the concepts of Lambda architecture and stream‑batch unification, which have already brought real‑time capabilities to big data and will similarly benefit AI.

Evolution of Big Data and the Need for Lambda Architecture

Initially, after Google’s seminal "Three Horsemen" papers, big data was dominated by batch processing. The rise of stream engines like Storm highlighted the importance of data timeliness, leading many enterprises to adopt the Lambda architecture, which combines batch and speed layers to balance cost, fault tolerance, and latency.

However, Lambda architecture suffers from high system complexity and maintainability because developers must maintain separate codebases for batch and speed layers.

Figure 1
Figure 1

To address this, many engines, including Flink and Spark, have pursued stream‑batch unification, allowing a single engine to execute both types of jobs. Flink achieves this by using the same SQL statements for both streaming and batch queries, guaranteeing result consistency, while Spark relies on micro‑batch streaming.

Figure 2
Figure 2

AI Real‑Time Requirements Across Stages

AI can be divided into three stages: data preprocessing (feature engineering), model training, and inference. Each stage may require either batch or stream processing depending on downstream needs, making stream‑batch unified engines advantageous, especially for maintaining consistent preprocessing logic.

Data Preprocessing

Preprocessing is often a big‑data problem; using a unified engine avoids maintaining separate code for batch and streaming, simplifying development.

Model Training

Training is typically offline batch, producing static models. However, data distribution drift and the need for rapid model updates in scenarios like Alibaba's Double‑11 or high‑frequency trading highlight the importance of online learning and continuous monitoring, which are inherently stream processing tasks.

Inference

Inference can be offline (batch) or online/near‑line (stream). Online inference demands millisecond latency, while near‑line tolerates sub‑second to second latency. Flink’s Stateful Functions enable ultra‑low‑latency online prediction using stateful functions.

Typical AI Architecture and Its Challenges

The common AI architecture pairs offline training with online inference, but suffers from long model update cycles and duplicated preprocessing code.

Model update cycles are typically long.

Separate codebases are needed for offline and online preprocessing.

Introducing a real‑time training pipeline generates samples from live data for online model updates, reducing latency and improving adaptability.

Figure 4
Figure 4

Combining pure online and offline pipelines with Lambda‑style hybrid approaches can address diverse AI scenarios.

Figure 5
Figure 5

Using a stream‑batch unified engine like Flink for data preprocessing reduces system complexity and supports various latency requirements across offline, near‑line, and online scenarios.

Figure 6
Figure 6

Flink’s Ongoing AI‑Related Efforts

Alink: a unified algorithm library offering both offline and online learning algorithms.

flink‑ai‑extended: integration of deep learning frameworks like TensorFlow and PyTorch into Flink.

Iterative semantics and high‑performance implementation for AI training loops.

Figure 7
Figure 7

Supplementary Initiatives

Flink ML Pipeline for reusable machine‑learning pipelines.

PyFlink, providing a Python API for AI developers.

Notebook integration (Zeppelin) for interactive AI experimentation.

Native Kubernetes support for cloud‑native deployments.

Improvements

Redesign and optimization of connectors to simplify integration with external data sources.

Innovations

AI Flow: a top‑level workflow abstraction for AI real‑time architectures (upcoming open‑source).

Stateful Functions: ultra‑low‑latency data preprocessing and inference.

These efforts aim to reduce system complexity, enhance performance, and provide a comprehensive platform for AI real‑timeization.

Flink AI, the future looks promising.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkAI
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.