Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0
This article explains how Apache Flink's State TTL feature works, demonstrates configuring TTL for state size control and automatic cleanup, and details the continuous cleanup mechanisms introduced in Flink 1.8.0 for both heap and RocksDB state backends.
Guide: Time‑based state access and controlling the size of application state are common challenges in stateful stream processing. Flink 1.8.0 improves the State TTL feature by adding continuous background cleanup of expired state objects, reducing the need for manual cleanup. State TTL lets you control the size of application state so developers can focus on core business logic.
When developing Flink applications, a common requirement for stateful stream jobs is automatic cleanup of state to manage state size effectively or to control how long state can be accessed. The TTL (Time‑to‑Live) feature was introduced in Flink 1.6.0 and enables state cleanup and efficient state size management.
In this article we discuss state TTL, present use cases, and show how to configure and use TTL.
Temporality of State
State can only exist for a limited time for two main reasons. For example, a Flink job may store each user's last login time to enable a seamless login experience.
Controlling State Size
Controlling state size allows effective management of ever‑growing state. Typical scenarios require temporary retention of data, such as a user session. After the session ends, the state is no longer needed, but it still occupies storage. Flink 1.8.0 introduces TTL‑based cleanup of expired state, eliminating the need for manual deletion, which was error‑prone and inefficient.
Confidentiality‑Driven Retention
If data must be inaccessible after a certain period, TTL can enforce that restriction.
Continuous Cleanup of Application State
Flink 1.6.0 introduced State TTL, allowing developers to set an expiration time after which state is cleaned up. Flink 1.8.0 extends this feature to support continuous cleanup of historic entries for RocksDB and heap state backends (FSStateBackend and MemoryStateBackend) based on the TTL setting.
In the DataStream API, application state is defined by a StateDescriptor. Providing a StateTtlConfiguration to the descriptor configures TTL. The following Java example creates a TTL configuration and attaches it to a descriptor that stores the last login time as a Long value:
<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px"><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.state.StateTtlConfig;<br/><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.time.Time;<br/><span style="font-size: 15px; color: rgb(198, 120, 221)">import</span> org.apache.flink.api.common.state.ValueStateDescriptor;<br/><br/>StateTtlConfig ttlConfig = StateTtlConfig<br/> .newBuilder(Time.days(<span style="font-size: 15px; color: rgb(209, 154, 102)">7</span>))<br/> .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)<br/> .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)<br/> .build();<br/><br/>ValueStateDescriptor<Long> lastUserLogin = <br/> <span style="font-size: 15px; color: rgb(198, 120, 221)">new</span> ValueStateDescriptor<>(<span style="font-size: 15px; color: rgb(152, 195, 121)">"lastUserLogin"</span>, Long.class);<br/><br/>lastUserLogin.enableTimeToLive(ttlConfig);<br/></span></code>Flink provides several options to configure TTL behavior.
When is the TTL reset?
By default, any state update refreshes the TTL. It can also be refreshed on read, at the cost of extra write operations.
Can expired data be accessed?
TTL uses a lazy cleanup strategy, so an application may read expired but not yet deleted data. Once accessed, the expired entry is immediately removed.
Which time semantics define TTL?
In Flink 1.8.0, only processing time can be used for TTL; event‑time support is planned for future releases.
Internally, TTL stores an additional timestamp alongside the actual state value, incurring some storage overhead but enabling queries, checkpointing, and recovery to consider expiration.
Avoiding "Garbage" Data
When a state object is read, Flink checks its timestamp and removes it if expired (depending on the configured visibility). Expired data that is never accessed will remain until garbage‑collected.
To delete expired state without explicit application logic, different backend cleanup strategies can be configured.
Full‑snapshot automatic deletion of expired state
Since Flink 1.6.0, full snapshots (checkpoints or savepoints) automatically delete expired state, but this does not apply to incremental checkpoints. The following example enables deletion for full snapshots:
<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/> .newBuilder(Time.days(7))<br/> .cleanupFullSnapshot()<br/> .build();<br/></span></code>This keeps the local state size unchanged while reducing the size of full snapshots. Expired local state is cleared only when the snapshot is reloaded.
Because of these limitations, Flink 1.6.0 still required manual deletion after expiration. Flink 1.8.0 introduces two autonomous cleanup strategies for the two state backend types.
Incremental Cleanup for Heap State Backend
This method applies to heap backends (FSStateBackend and MemoryStateBackend). The backend maintains a lazy global iterator over all state entries; certain events (e.g., state access) trigger incremental cleanup, deleting entries visited by the iterator.
<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/> .newBuilder(Time.days(7))<br/> // check 10 keys <span style="font-size: 15px; color: rgb(198, 120, 221)">for</span> every state access<br/> .cleanupIncrementally(10, <span style="font-size: 15px; color: rgb(86, 182, 194)">false</span>)<br/> .build();<br/></span></code>Each state access triggers a cleanup step that checks a configurable number of entries for expiration.
Two parameters are configurable: the number of entries examined per cleanup step, and a flag indicating whether cleanup should also be triggered after data processing.
Note that incremental cleanup adds latency to processing and will not delete expired state if there is no state access.
RocksDB background compaction can filter expired state
If the application uses RocksDB as the state backend, another cleanup strategy based on a Flink‑specific compaction filter can be enabled. RocksDB runs asynchronous compaction to merge updates and reduce storage; the filter discards entries whose TTL has expired.
Enable this by setting the following Flink configuration option:
state.backend.rocksdb.ttl.compaction.filter.enabled
After configuring the RocksDB backend, the compaction filter cleanup strategy can be activated as shown below:
<code style='font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: inherit; background-color: transparent; vertical-align: middle'><span style="font-size: 15px">StateTtlConfig ttlConfig = StateTtlConfig<br/> .newBuilder(Time.days(7))<br/> .cleanupInRocksdbCompactFilter()<br/> .build();<br/></span></code>Timer‑based Deletion
Another approach under evaluation is timer‑based cleanup, where a cleanup timer is registered for each state access. This method offers predictable deletion but incurs high storage overhead due to timers and frequent state reads.
Future Outlook
Beyond timer‑based strategies, the Flink community plans further improvements to TTL, notably adding event‑time support.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
