Big Data 10 min read

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

When a Flink program's checkpoint is activated, the state is persisted to prevent data loss and enable seamless recovery; how the state is organized and where it is persisted depends on the chosen state backend.

The location of checkpoint storage is determined by the configured state backend (JobManager memory, filesystem, database, etc.) and can be set programmatically via StreamExecutionEnvironment.setStateBackend(...) or through flink-conf.yaml.

Flink provides three out‑of‑the‑box state backends: MemoryStateBackend , FsStateBackend , and RocksDBStateBackend . By default, Flink uses the backend defined in flink-conf.yaml, but each job can override this setting.

MemoryStateBackend keeps state in the Java heap. It is suitable for small local debugging or jobs with minimal state. Example constructors:

public MemoryStateBackend() { this(5242880); }
public MemoryStateBackend(int maxStateSize) { this(maxStateSize, true); }
public MemoryStateBackend(boolean asynchronousSnapshots) { this(5242880, asynchronousSnapshots); }
public MemoryStateBackend(int maxStateSize, boolean asynchronousSnapshots) { this.maxStateSize = maxStateSize; this.asynchronousSnapshots = asynchronousSnapshots; }

Limitations: default maximum state size is 5 MB (configurable), cannot exceed Akka frame size, and the aggregated state must fit in JobManager memory.

FsStateBackend stores in‑flight data in TaskManager memory and writes checkpoints to a configured filesystem (e.g., HDFS or local file system). Example URIs:

hdfs://namenode:40010/flink/checkpoints
file:///data/flink/checkpoints

It is appropriate for large state, long windows, and high‑availability scenarios.

RocksDBStateBackend uses an embedded RocksDB instance on local disk, supports incremental checkpoints, and stores minimal metadata in JobManager memory. It is ideal for very large state, long windows, and HA deployments, but has a per‑key/value size limit of 2 GB and higher latency.

Throughput and latency comparisons show that Memory and Fs backends have high throughput and low latency, while RocksDB exhibits lower throughput and higher latency, especially in Yarn mode.

Summary table:

StateBackend

in‑flight

checkpoint

throughput

recommended scenario

MemoryStateBackend

TM Memory

JM Memory

High

Debugging, small state, no strict exactly‑once requirement

FsStateBackend

TM Memory

FS/HDFS

High

Ordinary state, windows, KV structures

RocksDBStateBackend

RocksDB on TM

FS/HDFS

Low

Very large state, long windows, large KV structures

To set a job‑level backend, use:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(new FsStateBackend("hdfs://namenode:40010/flink/checkpoints"));

The default backend can be configured in flink-conf.yaml with the state.backend property (e.g., jobmanager, filesystem, rocksdb or a fully‑qualified factory class).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkRocksDBState BackendCheckpointMemoryStateBackend
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.