Databases 18 min read

How Bilibili Scaled Live Guild Data with ClickHouse and a Custom Unique Engine

This article details Bilibili's transition from MySQL to ClickHouse for live guild analytics, describing data volume challenges, the design of a Unique Engine to overcome ReplacingMergeTree limitations, and performance results for query and write workloads.

dbaplus Community

Feb 27, 2024

How Bilibili Scaled Live Guild Data with ClickHouse and a Custom Unique Engine

Background: Bilibili's live guild system manages millions of streamers, requiring near‑real‑time revenue, view, and business metrics at daily granularity. The platform stores per‑streamer and per‑guild data at a scale of millions of rows per day, leading to heavy read and write loads.

Data Volume Estimation

By December 2023, the system handled over one million streamers, generating daily data in the millions of rows. Aggregations such as annual streamer data (1 × 365 = 365 rows) and guild‑level daily, weekly, monthly, and yearly aggregates (up to 73 million rows per year) were projected.

Why MySQL Fell Short

Scanning 6 million rows for monthly guild aggregates took >20 seconds on a 4‑core, 8 GB instance.

Storing a year of hot data required 365 million rows, stressing storage even with sharding.

OLAP queries could not be satisfied efficiently due to the need for frequent full‑table scans.

These constraints motivated a migration to ClickHouse.

ClickHouse Migration Strategy

The team evaluated query QPS (max ~20, target <50) and data isolation requirements (1:N relationships between streamers, agents, and admins). ClickHouse was chosen for its columnar, OLAP‑optimized architecture.

Initial ClickHouse Design with ReplacingMergeTree

ReplacingMergeTree was used to support UPSERT‑style updates via a merge‑on‑read approach, with a primary key on (record_date, uid) and mtime as the version column. Data were partitioned by month and stored on hot/cold media.

Problems identified:

Single‑threaded merge caused high latency.

Primary key had to match the deduplication key, limiting index flexibility.

Use of FINAL prevented jump‑index and PREWHERE optimizations.

Introducing the Unique Engine

To overcome these issues, a new Unique Engine was built using a Delete‑On‑Insert model. Key innovations include:

Separate unique‑key min‑max index, unique‑key index, and delete bitmap stored inside each data part.

During writes, historic parts are scanned; matching unique keys are marked in the delete bitmap, avoiding merge‑on‑read.

Three index types (map, unordered_map, StringHashMap) with hash‑based bucket loading to limit memory usage.

LevelDB‑backed unique‑key storage for fast ordered iteration and skip‑list style comparisons.

Table definition example:

CREATE TABLE bili_live.ads_guild (
    `id` Int64,
    `uid` Int64,
    `guild_id` Int64,
    `record_date` String,
    `mtime` DateTime,
    ...
    INDEX id_idx id TYPE bloom_filter(0.025) GRANULARITY 1,
    INDEX uid_idx uid TYPE bloom_filter(0.025) GRANULARITY 1
) ENGINE = ReplicatedUniqueMergeTree('/clickhouse/tables/{layer}-{shard}/bili_live/ads_guild', '{replica}', mtime)
PARTITION BY substring(record_date, 1, 6)
ORDER BY (record_date, guild_id)
UNIQUE KEY (record_date, uid)
SETTINGS index_granularity = 8192, storage_policy = 'hot_and_cold', enable_unique_key_bucket = 1, unique_key_deduplicate_level = 1, unique_key_index_type = 1;

Performance Results

Query latency improved 10‑20× across seven business queries, with p90 latency reduced to sub‑second and daily scanned data dropping from >60 TB to a fraction of that. Write latency for Unique Engine was ~10× slower than ReplacingMergeTree, but after LevelDB‑based optimizations, write cost became proportional to new data size rather than total historic size, yielding several‑fold speedups.

Further Optimizations

Identified bottlenecks:

Loading the unique‑key index from disk.

Comparing new keys against all historic keys.

Solution: store unique keys in LevelDB, sort incoming keys, and perform ordered iterator merges, dramatically reducing both load and comparison work.

Future Work

Replace remaining map‑based unique‑key indexes with LevelDB for full coverage.

Streamline the MySQL‑to‑ClickHouse pipeline by adopting Flink CDC and a unified ingestion service, reducing operational complexity and lowering entry barriers for new business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PerformanceOptimization ClickHouse OLAP DataWarehouse StreamingAnalytics UniqueEngine

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.