Databases 13 min read

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

This article explains the design and implementation of a cloud‑native MPP query layer for ClickHouse, detailing its architecture, core features, execution flow, performance advantages, SQL compatibility, and future development plans to create a high‑performance, multi‑source OLAP data platform.

Tencent Architect
Tencent Architect
Tencent Architect
How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

Background

Following Snowflake's success, the OLAP market has exploded with many open‑source projects. ClickHouse stands out for its performance in user‑behavior analysis, A/B testing, and online reporting, but it still lacks some functional features, ease of use, and multi‑source support. The goal is to build a high‑performance, cloud‑native OLAP warehouse based on ClickHouse, borrowing Snowflake's design ideas.

Core Features of the MPP Query Layer

Powerful functionality – supports complex multi‑table joins and aggregations.

Zero‑copy memory and full‑link vectorized MPP implementation .

SQL‑standard and MySQL protocol compatibility .

Continuous compatibility with the open‑source ecosystem .

Design Options and Chosen Architecture

Two solutions were considered: (1) improve the existing ClickHouse query layer, which would require invasive changes to the parser; (2) implement a brand‑new query layer that treats ClickHouse as a single‑node engine. The second option was chosen to keep the query layer independent and evolvable.

Query layer architecture diagram
Query layer architecture diagram

Execution Flow

User connects to a ClickHouse node and sends an SQL statement; the node acts as the Initiator and forwards the query to the Master.

The Master parses the SQL, uses the catalog to generate a physical query plan based on data distribution.

The Initiator distributes the plan to the appropriate ClickHouse nodes for execution.

Each ClickHouse node runs the MPP module, scanning data, performing joins/aggregations, and exchanging intermediate results via RPC.

The final result is returned to the Initiator, formatted, and sent back to the client.

Query execution flow diagram
Query execution flow diagram

Advantages of the Integrated MPP Engine

No data serialization between the storage and query layers because the MPP engine runs in the same process as ClickHouse.

Zero‑copy data exchange using ClickHouse's Block format reduces overhead.

Reuses ClickHouse's vectorized operators, achieving comparable performance.

Pushes simple functions, filters, and eventually single‑table aggregations down to ClickHouse, leveraging its indexes, statistics, and parallel aggregation.

Compatibility and Performance

The engine fully supports the SQL standard and MySQL protocol, allowing existing BI tools (e.g., Tableau) to connect without code changes. It has passed all TPC‑H queries and over 90% of TPC‑DS tests. The following diagram compares ClickHouse's native Scatter‑Gather model with the new multi‑stage MPP framework.

Execution framework comparison
Execution framework comparison

Future Work

Local cache optimization to further improve performance.

Development of a cost‑based optimizer (CBO) for complex queries.

Support for multiple data sources (OLTP, object storage, Elasticsearch, MongoDB) and semi‑structured data.

Full distribution of the system, abstracting shards and nodes from users.

Configuration to switch between the native and MPP engines:

SET use_mpp_engine = true
PerformanceCloud NativeClickHouseData WarehouseMPPSQL Compatibility
Tencent Architect
Written by

Tencent Architect

We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.