How ClickHouse Projections Supercharge Query Performance
The article explains ClickHouse's new Projection feature, how it overcomes MergeTree's single‑sort limitation and materialized view drawbacks, provides step‑by‑step commands to create, materialize, and query projections, demonstrates massive performance gains, and outlines the rules for automatic projection selection.
ClickHouse has introduced the powerful Projection feature to address two common issues: MergeTree tables support only one sorting rule, and materialized views are not intelligent enough, leading to maintenance and consistency problems.
Projection, inspired by the paper "C‑Store: A Column‑oriented DBMS" by Mike Stonebraker, allows a set of columns to be stored with a different sort order and supports aggregated queries.
Key characteristics of ClickHouse Projections include:
Part‑level storage : data is stored within the original table's partition directory, supporting both detailed and pre‑aggregated projections.
Seamless usage : multiple projections can be created on a MergeTree table, and ClickHouse automatically selects the optimal projection during query execution.
Data co‑source : projection data shares the same update and merge lifecycle as the base table, eliminating consistency issues.
Example using the hits_100m_obfuscated table (1 billion rows): SELECT count(*) FROM hits_100m_obfuscated Without a projection, a query on a non‑primary key column scans the entire table:
SELECT WatchID FROM hits_100m_obfuscated WHERE WatchID = 5814563137538961516Result: 800 MB scanned, 0.262 s.
Creating a projection to accelerate this query:
ALTER TABLE hits_100m_obfuscated ADD PROJECTION p1 (SELECT WatchID, Title ORDER BY WatchID)Only data written after the projection creation is materialized automatically; existing data must be materialized manually:
ALTER TABLE hits_100m_obfuscated MATERIALIZE PROJECTION p1Enable the experimental optimizer:
SET allow_experimental_projection_optimization = 1;Re‑executing the same query now scans only 65 KB and finishes in 0.006 s, a >40× speedup.
Projections also support pre‑aggregation. After creating an aggregated projection:
ALTER TABLE hits_100m_obfuscated ADD PROJECTION agg_p2 (SELECT UserID, SearchPhrase, count() GROUP BY UserID, SearchPhrase) ALTER TABLE hits_100m_obfuscated MATERIALIZE PROJECTION agg_p2Querying the aggregated data reduces scanned data by three‑quarters.
ClickHouse provides system tables to inspect projection storage:
SELECT name, partition, formatReadableSize(bytes_on_disk) AS bytes, formatReadableSize(parent_bytes_on_disk) AS parent_bytes, parent_rows, rows / parent_rows AS ratio FROM system.projection_partsProjections are essentially "space‑for‑time" optimizations and are cost‑effective.
Rules for automatic projection matching:
SET allow_experimental_projection_optimization = 1 is enabled.
Result row count is less than the base table total.
Query covers more than half of the partition parts.
WHERE clause is a subset of the projection's GROUP BY.
GROUP BY clause is a subset of the projection's GROUP BY.
SELECT list is a subset of the projection's SELECT.
If multiple projections match, the one reading the fewest parts is chosen.
Verification methods:
Use EXPLAIN to see if the plan includes a projection.
Check execution logs for messages like "Choose xxx projection".
Projections can be dropped with DDL:
ALTER TABLE hits_100m_obfuscated DROP PROJECTION p1They can also be defined directly in CREATE TABLE statements.
Overall, ClickHouse projections provide the performance benefits of materialized views without their maintenance overhead.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
