Tagged articles
17 articles
Page 1 of 1
Bilibili Tech
Bilibili Tech
Apr 9, 2024 · Big Data

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

In large‑scale Flink double‑stream joins, terabyte‑sized RocksDB state caused severe compaction latency and CPU spikes, but enabling RocksDB BlobDB KV‑separation (and an inner‑compaction patch) dramatically shrank SST files, reduced read/write latencies to sub‑millisecond levels, and cut CPU spikes by about half.

FlinkKV SeparationPerformance Optimization
0 likes · 12 min read
Optimizing Flink State Performance with RocksDB KV Separation and BlobDB
dbaplus Community
dbaplus Community
Dec 10, 2023 · Big Data

How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store

This article explains Bilibili's design and implementation of a remote state backend for Flink, detailing the motivations, pain points of the existing RocksDBStateBackend, the architecture of TaishanStateBackend, and the performance optimizations applied to achieve storage‑compute separation and faster rescaling.

Big DataFlinkRemote Storage
0 likes · 21 min read
How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store
WeiLi Technology Team
WeiLi Technology Team
Jun 2, 2023 · Big Data

Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

This article explains how to optimize Flink’s RocksDB state backend for large‑scale streaming jobs, covering state types, enabling latency tracking, incremental checkpoints, predefined options, and advanced memory and thread settings, with practical configuration examples and performance comparisons.

Big DataFlinkRocksDB
0 likes · 16 min read
Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs
Big Data Technology & Architecture
Big Data Technology & Architecture
May 11, 2023 · Big Data

Remote State Backend for Flink: Design, Optimization, and Deployment with Taishan KV Store

This article describes the motivation, challenges, design, and performance optimizations of a remote state backend for Flink that leverages Bilibili's Taishan distributed KV store to achieve storage‑compute separation, lighter checkpoints, faster rescaling, and improved resource utilization in large‑scale streaming jobs.

Big DataFlinkPerformance Optimization
0 likes · 20 min read
Remote State Backend for Flink: Design, Optimization, and Deployment with Taishan KV Store
Bilibili Tech
Bilibili Tech
Nov 4, 2022 · Big Data

Advancements and Optimizations of FlinkSQL at Bilibili

Bilibili’s FlinkSQL team has enhanced the Flink engine—now based on 1.11 with back‑ported 1.15 features—by adding Delay‑Join, table‑valued functions, projection‑push‑down, UDF and object reuse, automatic mini‑batch/two‑phase aggregation, key‑group skew fixes, connector slot‑groups, real‑time projection with Hudi, and RocksDB state‑performance tweaks, while planning remote state backends and deeper stream‑batch integration.

FlinkSQLPerformance OptimizationReal-time Projection
0 likes · 29 min read
Advancements and Optimizations of FlinkSQL at Bilibili
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 9, 2021 · Big Data

Comprehensive 2021 Flink Interview Questions and Answers

This article presents a detailed collection of 2021 Flink interview questions covering checkpoint mechanisms, watermarks, state backends, join types, fault tolerance, resource configuration, and recent Flink 1.10 features, providing concise explanations and code examples for each topic.

CheckpointFlinkState Backend
0 likes · 23 min read
Comprehensive 2021 Flink Interview Questions and Answers
Big Data Technology Architecture
Big Data Technology Architecture
Jun 16, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article describes how Kuaishou leverages Apache Flink for large‑scale real‑time multi‑dimensional analytics, details the architecture of its analytics platform using Kudu storage and KwaiBI, and introduces SlimBase—a lightweight, embedded shared state backend that replaces RocksDB to reduce I/O, latency, and CPU overhead.

FlinkKuaishouKudu
0 likes · 17 min read
Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations
DataFunTalk
DataFunTalk
Jun 11, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article presents Kuaishou's extensive use of Apache Flink for real-time multi-dimensional analytics, detailing the platform's architecture, cluster scale, data processing pipelines, the design of a shared state storage engine called SlimBase, and performance improvements achieved through replacing RocksDB with a customized HBase‑based solution.

Big DataFlinkKuaishou
0 likes · 15 min read
Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations
dbaplus Community
dbaplus Community
Feb 25, 2020 · Backend Development

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

This article explains a small‑file‑merging technique for Apache Flink checkpoints that reuses FSDataOutputStreams to combine multiple state files into a single HDFS file, detailing design considerations such as concurrent checkpoint support, reference‑counted deletion, space amplification reduction, fault handling, compatibility, and observed production performance gains.

Apache FlinkCheckpointHDFS
0 likes · 13 min read
How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 9, 2019 · Big Data

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big DataCheckpointFlink
0 likes · 10 min read
Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics