How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg
This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.
Background
Data‑driven concepts are now common across industries, with core steps including data collection, event tracking, modeling, analysis, and metric construction. Early tools such as Google Analytics and Baidu's big‑data platform paved the way, and later companies like Sensors Data and GrowthingIO launched independent analytics platforms. Bilibili began building its own data platform in 2019, eventually delivering a mature product called 北极星 that supports event collection, testing, management, and behavior analysis.
Technical Evolution
The 北极星 User Behavior Analysis (UBA) module has undergone three major iterations:
2019‑2020 (Partial Model + Spark Jar) : User queries triggered Spark Jar jobs that returned results from pre‑aggregated model tables. This approach suffered from inflexible model tables, fixed YARN resources, and limited concurrency, leading to query times of several minutes for funnel and path analysis.
2020‑2021 (No Model + Flink + ClickHouse) : ClickHouse was introduced as the storage engine. Raw data were ingested via Flink, enriched with Redis dimension tables and a dictionary service, and stored in ClickHouse using RoaringBitmap for set operations. Query latency dropped dramatically—90% of event queries returned within 5 seconds and 90% of funnel queries within 30 seconds, but resource consumption rose sharply, with peak Flink workloads reaching 1,200 CPU cores.
2021‑present (Iceberg Full‑Model + ClickHouse) : A full‑model aggregation built on Iceberg reduced raw data from hundreds of billions to tens of billions of rows per day. Sharding, primary‑key indexing, and push‑down parameters cut resource usage by 1,400 CPU cores, saved 400 GB of Redis memory, and reduced query times to an average of 2.77 seconds for event analysis, 0.58 seconds for funnel retention, and 10 seconds for cross‑day path queries.
Event and Retention Analysis
Event analysis aggregates user‑triggered events, while retention analysis measures how many users continue to perform a target action after an initial event. The previous detail‑level model required scanning billions of rows, resulting in 30‑50 second query latencies and a 30‑day window limit. The new approach pre‑aggregates data by user, event, and time, compressing daily raw data from billions to hundreds of millions of rows, expanding the query window to 45 days and reducing average query time to under 10 seconds.
Funnel and Path Analysis
Funnel analysis tracks conversion steps, while path analysis visualizes user navigation using Sankey diagrams. By pre‑aggregating user paths and storing them in ClickHouse with RBM‑based materialized views, daily data volume dropped from billions to tens of billions, enabling second‑level query response. ClickHouse’s windowFunnel function computes funnel levels, and custom SQL scripts generate path trees for visualization.
Tag and Audience Segmentation
北极星 leverages ClickHouse RBM for tag and audience generation, and a high‑availability dictionary service built on gRPC, LoadCache, Redis, and a custom RocksDB KV store. The service supports >500k QPS, provides >70% cache hit rates, and can recover 2 billion+ attribute records within 40 minutes after failure. Tags and AB‑test audiences are created by encoding user IDs and attributes, then performing bitmap set operations.
ClickHouse Data Ingestion Evolution
JDBC Write : Direct JDBC writes from Flink/Spark caused high server‑side CPU/memory usage and merge pressure.
BulkLoad via HDFS : Spark generated ClickHouse data parts locally, uploaded to HDFS, and ClickHouse fetched them with ALTER TABLE … FETCH PART. This reduced server load but added HDFS latency.
Direct BulkLoad (DataReceive) : A custom HTTP‑based DataReceive service lets Spark push data parts directly to ClickHouse, bypassing HDFS and doubling ingestion throughput.
ClickHouse Rebalancing
To handle petabyte‑scale data, the team introduced a balancing degree metric based on the coefficient of variation. Two algorithms were provided:
Bin‑Packing (Best‑Fit + AVL) : Sort parts by size and place them into nodes using an AVL tree for fast lookup.
Greedy : Iteratively move the largest part from the most loaded node to the least loaded node until balance improves.
Balancing plans operate at the table‑level, moving individual parts. Execution follows a safe sequence: pre‑check → fetch → detach → attach → detach‑cleanup → drop, with retry and rollback mechanisms and network‑bandwidth throttling to protect cluster stability.
ClickHouse Optimizations
Query Push‑Down : By rewriting distributed queries to execute heavy aggregations (e.g., windowFunnel) on each shard, query latency improved >5×, and a second version using cluster + view added another 30% gain.
Array/Map Indexes : Bloom‑filter based skip indexes were added to Array and Map columns, accelerating low‑frequency private‑attribute filters by several times.
Compression Choice : ZSTD(1) was selected over LZ4, saving >30% storage with negligible query impact, despite a modest 20% write‑performance drop.
Future Work
Plans include extending the unified behavior model to support additional business logs (e.g., community comments, likes) and deploying Z‑Order indexes to enable efficient multi‑dimensional filtering across different analysis modules.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
