Big Data 16 min read

Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0

This article explains how Apache Flink's State TTL feature works, demonstrates configuring TTL for state size control and automatic cleanup, and details the continuous cleanup mechanisms introduced in Flink 1.8.0 for both heap and RocksDB state backends.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0
Guide: Time‑based state access and controlling the size of application state are common challenges in stateful stream processing. Flink 1.8.0 improves the State TTL feature by adding continuous background cleanup of expired state objects, reducing the need for manual cleanup. State TTL lets you control the size of application state so developers can focus on core business logic.

When developing Flink applications, a common requirement for stateful stream jobs is automatic cleanup of state to manage state size effectively or to control how long state can be accessed. The TTL (Time‑to‑Live) feature was introduced in Flink 1.6.0 and enables state cleanup and efficient state size management.

In this article we discuss state TTL, present use cases, and show how to configure and use TTL.

Temporality of State

State can only exist for a limited time for two main reasons. For example, a Flink job may store each user's last login time to enable a seamless login experience.

Controlling State Size

Controlling state size allows effective management of ever‑growing state. Typical scenarios require temporary retention of data, such as a user session. After the session ends, the state is no longer needed, but it still occupies storage. Flink 1.8.0 introduces TTL‑based cleanup of expired state, eliminating the need for manual deletion, which was error‑prone and inefficient.

Confidentiality‑Driven Retention

If data must be inaccessible after a certain period, TTL can enforce that restriction.

Continuous Cleanup of Application State

Flink 1.6.0 introduced State TTL, allowing developers to set an expiration time after which state is cleaned up. Flink 1.8.0 extends this feature to support continuous cleanup of historic entries for RocksDB and heap state backends (FSStateBackend and MemoryStateBackend) based on the TTL setting.

In the DataStream API, application state is defined by a StateDescriptor. Providing a StateTtlConfiguration to the descriptor configures TTL. The following Java example creates a TTL configuration and attaches it to a descriptor that stores the last login time as a Long value:

<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px"><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.state.StateTtlConfig;<br/><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.time.Time;<br/><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.state.ValueStateDescriptor;<br/><br/>StateTtlConfig ttlConfig = StateTtlConfig<br/>    .newBuilder(Time.days(<span style="font-size: 15px; color: rgb(209, 154, 102)">7</span>))<br/>    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)<br/>    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)<br/>    .build();<br/><br/>ValueStateDescriptor<Long> lastUserLogin = <br/>    <span style="font-size: 15px; color: rgb(198, 120, 221)">new</span> ValueStateDescriptor<>(<span style="font-size: 15px; color: rgb(152, 195, 121)">"lastUserLogin"</span>, Long.class);<br/><br/>lastUserLogin.enableTimeToLive(ttlConfig);<br/></span></code>

Flink provides several options to configure TTL behavior.

When is the TTL reset?

By default, any state update refreshes the TTL. It can also be refreshed on read, at the cost of extra write operations.

Can expired data be accessed?

TTL uses a lazy cleanup strategy, so an application may read expired but not yet deleted data. Once accessed, the expired entry is immediately removed.

Which time semantics define TTL?

In Flink 1.8.0, only processing time can be used for TTL; event‑time support is planned for future releases.

Internally, TTL stores an additional timestamp alongside the actual state value, incurring some storage overhead but enabling queries, checkpointing, and recovery to consider expiration.

Avoiding "Garbage" Data

When a state object is read, Flink checks its timestamp and removes it if expired (depending on the configured visibility). Expired data that is never accessed will remain until garbage‑collected.

To delete expired state without explicit application logic, different backend cleanup strategies can be configured.

Full‑snapshot automatic deletion of expired state

Since Flink 1.6.0, full snapshots (checkpoints or savepoints) automatically delete expired state, but this does not apply to incremental checkpoints. The following example enables deletion for full snapshots:

<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/>    .newBuilder(Time.days(7))<br/>    .cleanupFullSnapshot()<br/>    .build();<br/></span></code>

This keeps the local state size unchanged while reducing the size of full snapshots. Expired local state is cleared only when the snapshot is reloaded.

Because of these limitations, Flink 1.6.0 still required manual deletion after expiration. Flink 1.8.0 introduces two autonomous cleanup strategies for the two state backend types.

Incremental Cleanup for Heap State Backend

This method applies to heap backends (FSStateBackend and MemoryStateBackend). The backend maintains a lazy global iterator over all state entries; certain events (e.g., state access) trigger incremental cleanup, deleting entries visited by the iterator.

<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/>    .newBuilder(Time.days(7))<br/>    // check 10 keys <span style="font-size: 15px; color: rgb(198, 120, 221)">for</span> every state access<br/>    .cleanupIncrementally(10, <span style="font-size: 15px; color: rgb(86, 182, 194)">false</span>)<br/>    .build();<br/></span></code>

Each state access triggers a cleanup step that checks a configurable number of entries for expiration.

Two parameters are configurable: the number of entries examined per cleanup step, and a flag indicating whether cleanup should also be triggered after data processing.

Note that incremental cleanup adds latency to processing and will not delete expired state if there is no state access.

RocksDB background compaction can filter expired state

If the application uses RocksDB as the state backend, another cleanup strategy based on a Flink‑specific compaction filter can be enabled. RocksDB runs asynchronous compaction to merge updates and reduce storage; the filter discards entries whose TTL has expired.

Enable this by setting the following Flink configuration option:

state.backend.rocksdb.ttl.compaction.filter.enabled

After configuring the RocksDB backend, the compaction filter cleanup strategy can be activated as shown below:

<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/>    .newBuilder(Time.days(7))<br/>    .cleanupInRocksdbCompactFilter()<br/>    .build();<br/></span></code>

Timer‑based Deletion

Another approach under evaluation is timer‑based cleanup, where a cleanup timer is registered for each state access. This method offers predictable deletion but incurs high storage overhead due to timers and frequent state reads.

Future Outlook

Beyond timer‑based strategies, the Flink community plans further improvements to TTL, notably adding event‑time support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javastream processingApache FlinkRocksDBState TTLContinuous Cleanup
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.