Remote StateBackend for Flink: Design, Optimizations, and Cloud‑Native Migration
To enable Bilibili’s cloud‑native migration, the team built a RemoteStateBackend that moves Flink’s keyed state to the Taishan KV store, using deterministic KeyGroup placement, per‑shard snapshots, asynchronous write batching, off‑heap caching with Bloom‑filter filtering, and a fixed‑size memory model, which together reduce checkpoint overhead, improve disk utilization, and accelerate rescaling for more than one hundred production jobs.
