ClickHouse Projection: Concepts, Use Cases, Implementation and Production Benefits
This article presents an in‑depth overview of ClickHouse's Projection feature, covering its background, definition, storage and query mechanisms, practical use‑case demonstrations, performance comparisons with competing OLAP systems, and real‑world production results that highlight its advantages and limitations.
The talk, delivered by Dr. Zheng Tianqi of Kuaishou, introduces the latest contributions to the ClickHouse open‑source community—Projection. After a brief introduction to ClickHouse's origins, architecture and core characteristics, the speaker explains why ClickHouse is widely adopted for OLAP workloads in large‑scale internet services.
Projection is defined as a set of columns that can be materialized with a specific ordering or pre‑aggregation, inspired by the Vertica concept. It can be created via an ALTER statement, and supports both normal (re‑ordered) and aggregate (pre‑aggregated) variants. Projections are stored as sub‑parts within ClickHouse parts, inheriting partitioning and enabling consistent merges and mutations.
Several practical examples are shown: a video‑log table with high cardinality columns, where a Projection on device_id reduces a full‑table scan from 8 seconds to a few milliseconds, achieving a 153× speedup. Another example demonstrates building an hourly, domain‑grouped aggregate Projection that accelerates dashboard queries from tens of seconds to under one second.
The implementation details cover three components: Projection definition (derived from user queries or inferred automatically), Projection storage (maintaining strong consistency with the base table), and Projection query analysis (rewriting the query plan to use the optimal Projection without requiring query changes). The optimizer evaluates candidate Projections, estimates scanned data, and selects the one with minimal I/O.
A comparison with other open‑source OLAP engines (Kylin, Druid, Doris) highlights ClickHouse's superior read/write performance, vectorized execution, and rich analytical functions, while noting its historical lack of materialized view capabilities and transaction support. Projection addresses these gaps by providing consistent, low‑overhead materialization.
Production measurements from Kuaishou show that Projections can handle daily volumes of hundreds of billions of rows, improve concurrent dashboard rendering, and add roughly 20‑40% storage overhead depending on the aggregation functions used. Limitations include part‑level granularity (no cross‑part aggregation), inability to change storage tier independently of the base table, and lack of join support.
In summary, Projection extends ClickHouse with production‑grade materialized view functionality, offering automatic query acceleration, strong consistency across INSERT/SELECT/UPDATE/DELETE, and flexible definition directly from workload queries, thereby strengthening ClickHouse's position in the big‑data OLAP landscape.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.