Design and Advantages of a Cloud‑Native ClickHouse OLAP System
This article presents the architecture, key features, and operational benefits of a cloud‑native ClickHouse OLAP platform, describing how storage‑compute separation, a unified master node, and shared storage reduce cost, improve availability, and simplify management while remaining fully compatible with the open‑source ClickHouse ecosystem.
The document introduces a cloud‑native ClickHouse solution that builds on ClickHouse’s high‑performance OLAP engine and incorporates design ideas from Snowflake to provide a one‑stop data‑analysis platform for multiple scenarios.
Key advantages include simplicity and easy maintenance through unified cluster management and shared distributed task scheduling, high availability and scalability supporting over five million tables, cost reduction of at least 50% in storage, and full compatibility with ClickHouse protocols, syntax, and storage formats.
Current ClickHouse suffers from usability, stability, maintainability, and feature gaps, such as the need for users to understand local and distributed tables, heavy reliance on Zookeeper causing bottlenecks, and lack of a true MPP query layer.
The proposed architecture adopts a three‑layer design:
Cluster Management Layer : a brain that provides metadata management and a shared distributed task scheduler based on a consistency protocol.
Compute Layer : multiple compute clusters where user queries run; clusters share the management layer.
Storage Layer : shared storage accessible by all compute clusters, offering cheap, on‑demand, unlimited capacity.
Data flow connects directly to ClickHouse nodes, bypassing a master node to avoid central bottlenecks, while control flow is coordinated by a lightweight master node that handles DDL tasks, schema storage, node join/leave, and provides high availability through multi‑replica consensus.
Storage‑compute separation enables strong consistency, multi‑read/write capability, and eliminates Zookeeper as a single point of failure. Shared storage stores parts, and a commit log records part changes, providing conflict handling, replay, and snapshot mechanisms.
Benefits of this design include:
At least 50% reduction in storage cost due to shared physical storage among replicas.
Elimination of dedicated Zookeeper clusters, saving resources.
Higher resource utilization with no read‑only replica waste.
Improved fault tolerance: any replica can take over reads and writes instantly.
Operationally, cluster scaling becomes seconds‑level: new nodes fetch schema from the master and part metadata from shared storage, while removed nodes can be shut down without data migration.
Compatibility with the open‑source ClickHouse ecosystem is preserved; only minimal, non‑intrusive changes are made, allowing seamless upgrades to upstream ClickHouse releases.
Future work includes adding an MPP query engine with distributed joins and aggregations, and removing shard concepts to provide a fully abstracted distributed system.
The article concludes with a recruitment call for engineers interested in high‑performance OLAP system development.
Example command to add a backend node via the master node:
ALTER CLUSTER cluster_name ADD BACKEND 'ip:port' TO SHARD 2;Example query to list clusters:
SELECT * FROM SYSTEM.CLUSTERS;Example table creation using the new architecture:
CREATE TABLE t1 (
partition_col_1 String,
tc1 int,
tc2 int
) ENGINE=MergeTree()
PARTITION BY partition_col_1
ORDER BY tc1;Tencent Architect
We share insights on storage, computing, networking and explore leading industry technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.