How Bilibili Scaled Live Guild Data with ClickHouse and a Custom Unique Engine
This article details Bilibili's transition from MySQL to ClickHouse for live guild analytics, describing data volume challenges, the design of a Unique Engine to overcome ReplacingMergeTree limitations, and performance results for query and write workloads.
Background: Bilibili's live guild system manages millions of streamers, requiring near‑real‑time revenue, view, and business metrics at daily granularity. The platform stores per‑streamer and per‑guild data at a scale of millions of rows per day, leading to heavy read and write loads.
Data Volume Estimation
By December 2023, the system handled over one million streamers, generating daily data in the millions of rows. Aggregations such as annual streamer data (1 × 365 = 365 rows) and guild‑level daily, weekly, monthly, and yearly aggregates (up to 73 million rows per year) were projected.
Why MySQL Fell Short
Scanning 6 million rows for monthly guild aggregates took >20 seconds on a 4‑core, 8 GB instance.
Storing a year of hot data required 365 million rows, stressing storage even with sharding.
OLAP queries could not be satisfied efficiently due to the need for frequent full‑table scans.
These constraints motivated a migration to ClickHouse.
ClickHouse Migration Strategy
The team evaluated query QPS (max ~20, target <50) and data isolation requirements (1:N relationships between streamers, agents, and admins). ClickHouse was chosen for its columnar, OLAP‑optimized architecture.
Initial ClickHouse Design with ReplacingMergeTree
ReplacingMergeTree was used to support UPSERT‑style updates via a merge‑on‑read approach, with a primary key on (record_date, uid) and mtime as the version column. Data were partitioned by month and stored on hot/cold media.
Problems identified:
Single‑threaded merge caused high latency.
Primary key had to match the deduplication key, limiting index flexibility.
Use of FINAL prevented jump‑index and PREWHERE optimizations.
Introducing the Unique Engine
To overcome these issues, a new Unique Engine was built using a Delete‑On‑Insert model. Key innovations include:
Separate unique‑key min‑max index, unique‑key index, and delete bitmap stored inside each data part.
During writes, historic parts are scanned; matching unique keys are marked in the delete bitmap, avoiding merge‑on‑read.
Three index types (map, unordered_map, StringHashMap) with hash‑based bucket loading to limit memory usage.
LevelDB‑backed unique‑key storage for fast ordered iteration and skip‑list style comparisons.
Table definition example:
CREATE TABLE bili_live.ads_guild (
`id` Int64,
`uid` Int64,
`guild_id` Int64,
`record_date` String,
`mtime` DateTime,
...
INDEX id_idx id TYPE bloom_filter(0.025) GRANULARITY 1,
INDEX uid_idx uid TYPE bloom_filter(0.025) GRANULARITY 1
) ENGINE = ReplicatedUniqueMergeTree('/clickhouse/tables/{layer}-{shard}/bili_live/ads_guild', '{replica}', mtime)
PARTITION BY substring(record_date, 1, 6)
ORDER BY (record_date, guild_id)
UNIQUE KEY (record_date, uid)
SETTINGS index_granularity = 8192, storage_policy = 'hot_and_cold', enable_unique_key_bucket = 1, unique_key_deduplicate_level = 1, unique_key_index_type = 1;Performance Results
Query latency improved 10‑20× across seven business queries, with p90 latency reduced to sub‑second and daily scanned data dropping from >60 TB to a fraction of that. Write latency for Unique Engine was ~10× slower than ReplacingMergeTree, but after LevelDB‑based optimizations, write cost became proportional to new data size rather than total historic size, yielding several‑fold speedups.
Further Optimizations
Identified bottlenecks:
Loading the unique‑key index from disk.
Comparing new keys against all historic keys.
Solution: store unique keys in LevelDB, sort incoming keys, and perform ordered iterator merges, dramatically reducing both load and comparison work.
Future Work
Replace remaining map‑based unique‑key indexes with LevelDB for full coverage.
Streamline the MySQL‑to‑ClickHouse pipeline by adopting Flink CDC and a unified ingestion service, reducing operational complexity and lowering entry barriers for new business scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
