Databases 25 min read

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

This article analyzes the challenges of scaling ClickHouse within Baidu’s MEG data platform and details a lake‑house solution that decouples storage and compute, integrates a meta‑service for transparent data access, optimizes query performance through caching, data roll‑up and layout tuning, and introduces a unified query gateway that gracefully falls back to Spark for complex workloads.

Baidu Geek Talk

Mar 23, 2026

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

Background and Challenges

The rapid growth of the Turing 3.0 ecosystem caused ClickHouse (CH) to face high storage costs, long ad‑hoc exploration pipelines, and slow node recovery. Key issues were:

Expensive long‑term storage and limited horizontal scaling.

Multi‑step data ingestion (create table → load → query) that hindered instant exploration.

Node failures required hours to recover, reducing service availability.

Lakehouse Architecture

MEG built a storage‑compute‑separated lakehouse that integrates the Meta service with Turing metadata, enabling “zero‑ingest” transparent queries on lake data. The design uses hot‑cold tiered caching, data roll‑up (aggregation tables), and layout tuning to overcome I/O bottlenecks of shared storage. A unified query gateway automatically downgrades complex CH queries to Spark.

Metadata Service Integration

The Meta service acts as a bridge between CH and external data sources. It classifies metadata into three categories, caches hot entries, compresses similar keys, and provides real‑time subscription. These mechanisms reduced average metadata query latency from >600 ms to <50 ms.

Query Performance Optimizations

Hot‑cold tiered caching : Frequently accessed data is cached on local SSD, cutting I/O latency and network traffic.

Data roll‑up (aggregation tables) : Lifecycle management, asynchronous sync, and transparent query rewriting reduce query time by ~55 % for high‑frequency patterns.

Data layout tuning : Small‑file merging, local sorting, and pre‑bucketed partitioning improve Parquet scan efficiency and can lower query time by up to 80 %.

Unified Query Gateway

Complex CH queries that exceed resource limits are automatically routed to Spark. The gateway uses SQL pattern matching and error‑based fallback, then rewrites CH SQL to Spark SQL with SQLGlot, handling function name differences and index base mismatches. Successful rewrites exceed 80 %; failures fall back gracefully.

Open‑Source ClickHouse Enhancements for Lakehouse

To support external lake sources, CH added table functions icebergCluster and deltaLakeCluster in version 24.11, and CH Cloud enabled full DataLakeCatalog support from version 25.8.

AFS Table Engine

A custom AFS table engine was introduced to read Baidu’s internal AFS distributed file system directly, providing predicate push‑down, parallel reads, and small‑IO merging.

Meta Service Extensions

Unified metadata discovery and real‑time subscription.

Hot‑entry caching in a distributed cache with multi‑threaded access.

Metadata compression by merging common prefixes, reducing network payload.

These extensions lowered metadata query latency to <50 ms.

Cold‑Hot Data Tiering

Hot data is cached on local SSD; cache misses trigger asynchronous fetch from AFS while populating the SSD cache. Business‑level cache pools and eviction policies prevent cross‑business interference.

Consistent Hashing for Cache Locality

Consistent hashing distributes file fragments evenly across shards while keeping the same file on a fixed shard, minimizing cache reshuffling during scaling.

Data Roll‑Up (Aggregation Tables)

Roll‑up tables are built on remote storage to keep the compute layer stateless. Lifecycle management includes manual creation, platform‑driven automation, and async synchronization (Sync vs Async modes). Versioning guarantees consistency; an external compute queue isolates sync load from query load.

Transparent Query Rewrite for Roll‑Up

Roll‑up hit check: match query dimensions/metrics against existing roll‑up patterns.

Optimal roll‑up selection: use version comparison and CBO to pick the best aggregation table.

Query rewrite: replace base‑table partitions with roll‑up partitions before execution.

Roll‑up usage reduced average query latency by ~55 % for high‑frequency workloads.

Data Layout Optimizations

Three rules are applied during ETL:

Small‑file merge : Consolidate many tiny Parquet files into larger ones to reduce I/O fragmentation.

Local sorting : Sort data on disk to improve compression ratios and enable more effective Parquet min‑max pruning.

Pre‑bucket (NoMerge) partitioning : Partition data by high‑cardinality keys and keep each bucket on a single CH node, avoiding costly shuffles for distinct‑count queries.

These optimizations achieved up to 80 % reduction in query time for some workloads and >60 % improvement for high‑cardinality distinct queries.

Results and Outlook

More than 30 business lines and 300 datasets now use the CH lakehouse, handling >20 k daily PV with an average response time <6 s (P90 < 13 s). Future work includes:

Elastic compute orchestration via Kubernetes operators and a resource‑management service.

Automated statistics services for better optimizer decisions on open table/file formats.

Support for emerging formats such as Lance to improve point‑lookup performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Data Platform ClickHouse Spark Lakehouse metadata service Query Gateway

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.