Cloud Native 18 min read

REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

REDck is a cloud‑native, storage‑compute separated real‑time OLAP data warehouse derived from ClickHouse that addresses scalability, operational cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, and two‑phase commit transactions.

DataFunTalk
DataFunTalk
DataFunTalk
REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

ClickHouse, while offering industry‑leading OLAP performance, suffers from high operational cost, limited elasticity, and fault‑tolerance issues; to overcome these, Xiaohongshu's data‑flow team created REDck, a cloud‑native real‑time data warehouse that retains ClickHouse's speed while adding elastic scaling of compute and storage.

REDck adopts a storage‑compute separation architecture composed of three layers—unified metadata service, compute layer, and storage layer—where metadata is centralized via a stateless Metastore (using MySQL/Redis internally and integrating with Hive/Iceberg externally), compute groups are orchestrated by a master‑worker model, and storage leverages object storage for virtually unlimited capacity.

To mitigate object‑storage latency, REDck introduces enhanced caching (memory → local disk → distributed cache) and parallel download strategies, dramatically improving read throughput and reducing HTTP overhead.

Distributed task scheduling is achieved through a globally elected master that coordinates compaction, mutation, insert, and cache tasks, while data bucketing (hash‑based partitioning) improves query pruning, aggregation, and join performance, supporting elastic scaling.

REDck implements a two‑phase commit protocol and integrates with Flink checkpointing to provide exactly‑once semantics for data ingestion, addressing ClickHouse's lack of mature distributed transaction support.

After two years of deployment, REDck serves over ten business lines, storing more than 30 PB of data, achieving 99.9% availability, ten‑fold CPU efficiency, and ten‑fold cost reduction, with cluster scaling from weeks to minutes.

cloud nativecachingClickHousedistributed transactionsReal-time OLAPstorage-compute separationmetadata service
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.