How REDck Transforms ClickHouse into a Scalable Cloud‑Native Real‑Time Data Warehouse
Xiaohongshu built REDck, a cloud‑native, storage‑compute separated real‑time OLAP warehouse on ClickHouse, addressing scaling, cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, bucketing, and exactly‑once transaction support.
Background
ClickHouse is a high‑performance OLAP database used for ad‑tech, community, live streaming and e‑commerce workloads. Its native shared‑nothing MPP architecture provides sub‑second query latency but suffers from high operational overhead, limited elastic scaling, and fragile fault‑tolerance.
Challenges of the Original ClickHouse Deployment
Elastic scaling difficulty : Compute and storage are tightly coupled; adding nodes requires manual data rebalancing and weeks of migration.
Low resource utilization : Multi‑replica storage inflates CPU and storage costs; compute capacity often exceeds storage needs.
Stability issues : Zookeeper‑based coordination becomes a single point of failure at large scale; query latency spikes under load.
Lack of distributed transactions : No exactly‑once guarantees for data ingestion pipelines, leading to duplicate or inconsistent data.
REDck Architecture Overview
REDck (Real‑time Elastic Data warehouse on ClickHouse) is a cloud‑native redesign that separates compute from storage, introduces a stateless unified metadata service, and uses object storage as the primary data lake.
Unified Metadata Service (Metastore)
Metadata is centralized in a stateless Metastore. Internal metadata is stored in MySQL (transactional, consistent) while external catalogs such as Hive or Iceberg can be integrated. Compute nodes retrieve up‑to‑date schema and partition information from the Metastore, eliminating per‑node local metadata and Zookeeper coordination.
Object‑Storage Access Optimizations
Data resides in cloud object storage (e.g., S3, OSS) which offers virtually unlimited capacity but higher latency and lower single‑thread throughput. REDck mitigates these drawbacks through:
Multi‑level caching : In‑memory → local‑disk → distributed cache. Cached reads can be up to 100× faster; parallel downloads achieve ~10× speedup for uncached data.
Query‑plan reordering : Parts are read in parallel and HTTP round‑trips are minimized by grouping mark ranges per connection.
Robust access module : Timeout detection, retry logic, and data‑integrity checks improve stability.
Multi‑Level Caching Strategy
REDck provides two caching policies:
Passive cache : Data is cached on‑demand during query execution.
Active cache : Hot data is pre‑loaded based on user‑defined rules and query history. Eviction uses LRU or Clock‑Sweep.
Distributed Task Scheduling
A global Master role elects a single Server to coordinate cluster‑wide tasks (compaction, mutation, inserts, cache refresh). Scheduling is bucket‑based, automatically adapting to scale‑out or scale‑in events to avoid conflicts.
Data Bucketing
Tables can be bucketed on a chosen key (e.g., user_id). A hash function maps rows to a fixed number of buckets, enabling:
Fast point‑lookups using bucket keys.
Reduced shuffle for joins and aggregations.
Bucket‑level task scheduling that supports elastic scaling.
Exactly‑Once Distributed Transactions
REDck implements a two‑phase commit (2PC) protocol managed by the Metastore. The protocol provides exactly‑once semantics for ingestion pipelines such as Hive→REDck, Spark→REDck, and Flink→REDck (via Flink checkpoint integration). This eliminates duplicate writes and ensures global visibility of committed data.
Offline Sync Optimizations
Batch ingestion is performed with Spark instead of Flink micro‑batches, simplifying the pipeline, removing compaction‑induced write amplification, and supporting INSERT OVERWRITE semantics to avoid reading partially loaded data.
Performance and Operational Impact
After two years of production, REDck serves >10 business lines with >30 PB of data and clusters reaching tens of thousands of CPU cores. Compared with the original ClickHouse deployment:
CPU efficiency improved ~10× (more data processed per core).
Storage cost per TB reduced ~10× thanks to object‑storage and elimination of multi‑replica overhead.
Query latency remains comparable to native ClickHouse despite object‑storage latency.
Elastic scaling time reduced from weeks to minutes; cluster availability >99.9%.
These gains enable data retention extensions from months to years and support a rapidly growing set of analytical use cases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
