Databases 13 min read

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

This article explains the design and implementation of a cloud‑native MPP query layer for ClickHouse, detailing its architecture, core features, execution flow, performance advantages, SQL compatibility, and future development plans to create a high‑performance, multi‑source OLAP data platform.

Tencent Architect

Dec 10, 2021

How a Cloud‑Native MPP Query Layer Turns ClickHouse into a Snowflake‑Like Data Warehouse

Background

Following Snowflake's success, the OLAP market has exploded with many open‑source projects. ClickHouse stands out for its performance in user‑behavior analysis, A/B testing, and online reporting, but it still lacks some functional features, ease of use, and multi‑source support. The goal is to build a high‑performance, cloud‑native OLAP warehouse based on ClickHouse, borrowing Snowflake's design ideas.

Core Features of the MPP Query Layer

Powerful functionality – supports complex multi‑table joins and aggregations.

Zero‑copy memory and full‑link vectorized MPP implementation .

SQL‑standard and MySQL protocol compatibility .

Continuous compatibility with the open‑source ecosystem .

Design Options and Chosen Architecture

Two solutions were considered: (1) improve the existing ClickHouse query layer, which would require invasive changes to the parser; (2) implement a brand‑new query layer that treats ClickHouse as a single‑node engine. The second option was chosen to keep the query layer independent and evolvable.

Execution Flow

User connects to a ClickHouse node and sends an SQL statement; the node acts as the Initiator and forwards the query to the Master.

The Master parses the SQL, uses the catalog to generate a physical query plan based on data distribution.

The Initiator distributes the plan to the appropriate ClickHouse nodes for execution.

Each ClickHouse node runs the MPP module, scanning data, performing joins/aggregations, and exchanging intermediate results via RPC.

The final result is returned to the Initiator, formatted, and sent back to the client.

Advantages of the Integrated MPP Engine

No data serialization between the storage and query layers because the MPP engine runs in the same process as ClickHouse.

Zero‑copy data exchange using ClickHouse's Block format reduces overhead.

Reuses ClickHouse's vectorized operators, achieving comparable performance.

Pushes simple functions, filters, and eventually single‑table aggregations down to ClickHouse, leveraging its indexes, statistics, and parallel aggregation.

Compatibility and Performance

The engine fully supports the SQL standard and MySQL protocol, allowing existing BI tools (e.g., Tableau) to connect without code changes. It has passed all TPC‑H queries and over 90% of TPC‑DS tests. The following diagram compares ClickHouse's native Scatter‑Gather model with the new multi‑stage MPP framework.

Future Work

Local cache optimization to further improve performance.

Development of a cost‑based optimizer (CBO) for complex queries.

Support for multiple data sources (OLTP, object storage, Elasticsearch, MongoDB) and semi‑structured data.

Full distribution of the system, abstracting shards and nodes from users.

Configuration to switch between the native and MPP engines:

SET use_mpp_engine = true

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native ClickHouse MPP SQL Compatibility

Written by

Tencent Architect

We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.