REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse
REDck is a cloud‑native, real‑time OLAP data warehouse built on ClickHouse that adds elastic compute and storage scaling, object‑storage optimizations, multi‑level caching, and exactly‑once ingestion, delivering petabyte‑scale interactive analytics with ten‑fold CPU efficiency, ten‑fold cost reduction, and 99.9% availability.
ClickHouse is one of the most performant OLAP systems in the industry and is widely used inside Xiaohongshu for advertising, community, live streaming, and e‑commerce. However, the native MPP architecture of ClickHouse has significant limitations in operational cost, elastic scaling, and fault recovery.
To address these challenges, the Xiaohongshu data‑flow team independently developed a cloud‑native real‑time data warehouse called REDck (RED ClickHouse) based on the open‑source ClickHouse codebase. REDck retains ClickHouse’s ultra‑high performance while adding deep cloud‑native transformations that enable elastic scaling of both compute and storage layers, reducing operational burden and cost.
REDck now supports petabyte‑scale interactive analysis and has been deployed in more than ten business scenarios, reaching a total storage size of over 30 PB. In the experiment platform, storage retention grew from 2 months to 2 years, query availability reached 99.9 %, and the system handles tens of thousands of cores and multiple petabytes of data.
Architecture Overview
The cloud‑native architecture consists of three layers: a unified metadata service, a compute layer, and a storage layer. The metadata service centralizes metadata in a stateless service using MySQL (internal) and Hive/Iceberg (external), replacing the distributed local‑disk metadata of vanilla ClickHouse and eliminating the reliance on Zookeeper.
The compute layer is organized into compute groups (distributed clusters) managed by a master role that coordinates task scheduling. This design enables both horizontal and vertical elastic scaling.
The storage layer leverages object storage as the primary data store, providing virtually unlimited capacity and low cost while allowing the system to become stateless.
Object‑Storage Optimizations
To mitigate the higher latency and lower single‑thread throughput of object storage, REDck implements caching, parallel download strategies, query plan reordering, and robust retry mechanisms, achieving up to 100× query speed improvements for cached data and 10× for uncached data.
Multi‑Level Caching
A three‑tier cache hierarchy (memory → local disk → distributed cache) is employed. Passive caching stores data on‑demand, while active caching pre‑loads hot data based on usage patterns. LRU and Clock‑Sweep eviction policies are used, with a catalog in memory to accelerate cleanup.
Distributed Task Scheduling and Data Bucketing
A global master elects a server to coordinate tasks such as compaction, mutation, and inserts, ensuring ordered execution and avoiding conflicts during scaling. Data is partitioned into buckets using hash‑based keys (e.g., user ID), which improves query filtering, reduces shuffle, and provides a natural unit for elastic scaling.
Two‑Phase Commit and Exactly‑Once Guarantees
REDck introduces a two‑phase commit protocol managed by the unified metadata service, enabling exactly‑once semantics for data ingestion from Hive/Spark and Flink, thus eliminating duplicate writes and improving reliability.
Performance Impact
Since its launch, REDck has reduced storage costs by an order of magnitude, increased CPU efficiency by 10×, and enabled minute‑level elastic scaling. Query latency remains comparable to native ClickHouse despite the added metadata and object‑storage layers, and overall availability exceeds 99.9 %.
The system now supports over 10 business lines, handling PB‑scale data with thousands of cores, and provides interactive analysis for A/B testing, user behavior, and ad segmentation.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.