Stateful Stream Processing and Fault‑Tolerance Mechanisms in Apache Flink
This article explains the concept of stateful computation in stream processing, highlights the shortcomings of traditional systems, and details how Apache Flink provides rich state access, various state backends, and robust checkpointing mechanisms to achieve scalable, fault‑tolerant real‑time analytics.
1. Stateful Stream Data Processing
1.1 What is Stateful Computation
In a stateful computation the result depends not only on the current input but also on the operator's internal state; most real‑world computations are stateful, such as word‑count where the count is continuously updated.
1.2 Traditional Stream Systems Lack Effective State Support
State storage and access
State backup and recovery
State partitioning and dynamic scaling
Batch systems process data in fixed partitions, requiring little state. In contrast, stream systems run continuously on unbounded data, demanding robust state management. Early systems like Storm offered no native state support, forcing workarounds such as Storm+HBase, which introduced performance penalties, difficult backup/recovery, and scaling challenges.
1.3 Flink: Rich State Access and Efficient Fault‑Tolerance
Flink was designed with built‑in state handling and fault‑tolerance, as illustrated below.
Flink provides extensive state APIs and a high‑performance checkpointing mechanism.
2. State Management in Flink
Flink classifies state into two major types based on partitioning and scaling:
Keyed States
Operator States
2.1 Keyed States
Keyed States are accessed per key and support multiple data structures.
Flink offers several keyed state types, and they can be dynamically scaled.
Dynamic scaling of keyed state is illustrated below.
2.2 Operator States
Operator States Usage
Currently only ListState is supported for operator state.
Operator State Expansion Methods
Three expansion strategies are provided:
ListState – merges lists from old parallelism and redistributes evenly.
UnionListState – concatenates lists without repartitioning, leaving distribution to the user.
BroadcastState – replicates the same data to all parallel instances, useful for small‑table joins.
These options let users choose the most suitable approach for their workload.
Using Checkpoints to Improve Reliability
Enabling checkpoints causes Flink to periodically snapshot the state of all tasks. Upon failure, the job resumes from the latest checkpoint, guaranteeing either at‑least‑once or exactly‑once semantics.
Flink also supports two advanced recovery mechanisms:
Savepoint – a manually triggered, version‑compatible snapshot for upgrades.
External Checkpoint – an additional copy of a regular checkpoint stored in a user‑specified directory.
3. State Backend and Fault‑Tolerance Implementation
Flink offers three StateBackend implementations:
MemoryStateBackend
FsStateBackend
RocksDBStateBackend
For small datasets, MemoryStateBackend or FsStateBackend are sufficient; large state sizes benefit from RocksDB.
Two keyed state backends are highlighted:
HeapKeyedStateBackend
RocksDBKeyedStateBackend
Checkpoint Execution Flow
Flink follows the Chandy‑Lamport algorithm to coordinate distributed snapshots.
Checkpoint Barrier Alignment
Full Checkpoint
A full checkpoint copies all state to external storage, which can impact performance; incremental techniques mitigate this.
RocksDB Incremental Checkpoint
RocksDB writes new data to memory and flushes to disk when full; incremental checkpoints only persist newly created files, reducing I/O.
4. Related Work at Alibaba
Alibaba began evaluating Flink in 2015, launched the Blink project in October 2015, and deployed it for large‑scale production. By the 2016 Double‑11 event, Blink powered search, recommendation, and advertising, and by May 2017 it became Alibaba’s real‑time compute engine.
Alibaba’s contributions focus on state management and fault tolerance, including ongoing work to refactor window operators using state, and plans to enhance asynchronous checkpointing in collaboration with the Flink community.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
