Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response
This article details Bilibili's implementation of an Iceberg‑based lakehouse platform that unifies storage and analytics, addressing Hive’s performance and latency issues through multidimensional sorting, various file‑level indexes, cube pre‑aggregation, star‑tree structures, and an automated Magnus service for intelligent optimization, achieving near‑second query responses.
To overcome Hive's shortcomings—insufficient interactive query performance, complex data pipelines, data silos, and hour‑level latency—Bilibili built a lakehouse platform on Apache Iceberg, aiming for interoperable, high‑performance, and user‑friendly analytics.
The architecture stores Iceberg tables on HDFS, with data ingestion via FIink, Spark, or a Java API; a service called Magnus continuously optimizes Iceberg metadata, while Alluxio provides caching and Trino serves as the interactive query engine. Some workloads still fall back to ClickHouse or Elasticsearch for millisecond‑level latency.
Iceberg manages file‑level metadata using snapshots and manifests, offering an open storage format that facilitates future extensions.
For query acceleration, the team applies multidimensional sorting (preferring the Hibert Curve over Z‑ORDER) and a Boundary Index to handle non‑integer columns. They limit sorting fields to four to maintain clustering effectiveness.
Supported file‑level indexes include BloomFilter (simple, space‑efficient, equality only), Bitmap (supports equality and range, larger footprint), BloomRF (multi‑segment hash, similar to BloomFilter), and token‑based indexes (TokenBloomFilter, TokenBitmap, Ngram variants) for log data.
Pre‑aggregation is realized through Cubes (or AggIndex) that store aggregated results at the file level, supporting both single‑table and star‑schema queries. When Cube data is incomplete, the engine merges existing Cubes with on‑the‑fly aggregation. A star‑tree index is built on top of Cubes to balance storage cost and query coverage.
The Magnus service automates backend optimizations: it listens to Iceberg write events and triggers Spark jobs for sorting, indexing, and Cube generation; it visualizes table metadata for troubleshooting; and it analyzes Trino query logs to recommend schema or index adjustments.
In production, the platform manages roughly 5 PB of Iceberg tables with a daily growth of 75 TB, handles about 200 k Trino queries per day, and achieves a P95 response time of 5 seconds, targeting sub‑second to 10‑second latency for most workloads.
The presentation concludes with an invitation to follow Bilibili's technical community for further updates.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.