Big Data 15 min read

How Bilibili Scaled Its OLAP Platform with ClickHouse and Lakehouse Integration

At Bilibili, the OLAP platform evolved through three phases—consolidating data services onto ClickHouse, migrating text search to ClickHouse, and integrating a lake‑house architecture—delivering massive cost reductions, sub‑second query latency, and scalable analytics for billions of daily events.

dbaplus Community
dbaplus Community
dbaplus Community
How Bilibili Scaled Its OLAP Platform with ClickHouse and Lakehouse Integration

Background

Two years ago Bilibili operated many independent OLAP engines (Apache Kylin, Elasticsearch, Druid, ClickHouse, Presto, etc.), resulting in high maintenance cost and unstable performance.

Phase 1 – Consolidation to ClickHouse

All data‑service workloads were migrated to a ClickHouse cluster, retiring Kylin and Druid. ClickHouse was chosen for its native file‑system storage, vectorized execution engine, rich built‑in functions, MergeTree engines, materialized views and indexes.

Typical workloads include user‑behavior analysis, tag selection and content analysis, processing billions of events daily. After migration a 64‑node ClickHouse cluster (≈5 PB) achieves P90 query latency 4 seconds , compared with 10‑30 minutes on Spark.

ClickHouse cluster architecture
ClickHouse cluster architecture

To mitigate write amplification and MergeTree overhead, a Spark‑ClickHouse bulk‑load pipeline was built. Spark tasks generate ClickHouse data files locally; a two‑phase commit uploads the files to the ClickHouse cluster, moving most I/O off the OLAP nodes.

Spark‑ClickHouse bulk load workflow
Spark‑ClickHouse bulk load workflow

Phase 2 – Text Search Migration to ClickHouse

Log‑based text search and search‑ranking workloads were moved from Elasticsearch to ClickHouse. For log search, ClickHouse primary‑key range pruning and token‑bloom‑filter indexes on the message field provide:

≈10× write speedup

≈⅓ storage cost

≈2× query speedup (P90 ≈ 3 s)

Search‑ranking still uses Elasticsearch for scoring, while exact‑match log queries run on ClickHouse.

Log platform migration diagram
Log platform migration diagram

Key techniques:

Primary‑key range pruning to limit scanned partitions.

Token‑bloom‑filter index on message for lightweight full‑text filtering.

On‑the‑fly computation for aggregations (bitmap, set operations, etc.).

Log query optimization
Log query optimization

Phase 3 – Lake‑House Integration (Iceberg + Trino)

A lake‑house stack centered on Apache Iceberg was introduced. Spark and Flink write to Iceberg tables; a custom service named Magnus receives write events and triggers asynchronous optimizations (small‑file merging, data sorting, index creation). Trino serves queries on Iceberg, while Alluxio caches Iceberg metadata and indexes.

Benefits:

ACID guarantees and near‑real‑time visibility for historical logs.

Lower storage cost compared with ClickHouse for long‑term data.

Metric service processes ~200 k queries per day with P90 ≈ 1.2 s.

Iceberg‑based lake‑house architecture
Iceberg‑based lake‑house architecture

Engine Selection Guidance

Three engines coexist:

Elasticsearch – used for search‑ranking where relevance scoring is required.

ClickHouse – preferred for sub‑second interactive analytical queries (e.g., user‑behavior analysis, log search).

Lake‑house (Iceberg + Trino) – chosen for cost‑effective, near‑real‑time analytics where sub‑second latency is not mandatory (e.g., metric service, long‑term log storage).

Selection is based on business type and latency requirements.

Conclusions

ClickHouse provides a powerful, versatile OLAP engine covering most interactive analytical scenarios with low latency and reduced cost. The lake‑house complements ClickHouse by offering cheaper storage, ACID guarantees, and real‑time visibility for offline‑oriented workloads. Future work aims to unify compute layers, following trends in StarRocks and Doris.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataClickHouseOLAPData Analytics
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.