Overview of ClickHouse Table Engines and Their Usage
This article provides a comprehensive overview of ClickHouse's table engines, explaining the purpose and characteristics of each engine series—including Log, Integration, Special, and the various MergeTree families—while offering detailed SQL examples and guidance on when to use each engine for different data workloads.
Table engines are critical in ClickHouse, determining storage, read/write concurrency, indexing, query types, and replication.
Engine Overview
ClickHouse offers about 28 engines grouped into Log, MergeTree, Integration, and Special series, plus Replicated and Distributed special engines.
Log Series
Designed for fast writes of small tables (≈1M rows) and full scans. Common traits: sequential append, no delete/update, no index, no atomic writes, insert blocks selects. Variants: TinyLog (no concurrent reads), StripLog (concurrent reads, single file), Log (concurrent reads, per‑column files).
Integration Series
Used to import or query external data sources such as Kafka, MySQL, JDBC/ODBC, HDFS.
Special Series
Purpose‑built engines: Memory (in‑memory, non‑persistent), Buffer (in‑memory buffer flushed to disk), File (local file storage), Null (discard writes).
MergeTree Series
The primary engine supporting most ClickHouse features. Includes MergeTree, ReplacingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, SummingMergeTree, AggregatingMergeTree.
MergeTree
Supports partitioning, ordered storage, primary key index, TTL, etc. Example DDL and inserts show that primary key does not enforce uniqueness; compaction does not remove duplicate rows.
CREATE TABLE test_tbl (
id UInt16,
create_time Date,
comment Nullable(String)
) ENGINE = MergeTree()
PARTITION BY create_time
ORDER BY (id, create_time)
PRIMARY KEY (id, create_time)
TTL create_time + INTERVAL 1 MONTH
SETTINGS index_granularity=8192;ReplacingMergeTree
Provides deduplication of rows with the same primary key after a final OPTIMIZE FINAL operation.
CREATE TABLE test_tbl_replacing (
id UInt16,
create_time Date,
comment Nullable(String)
) ENGINE = ReplacingMergeTree()
PARTITION BY create_time
ORDER BY (id, create_time)
PRIMARY KEY (id, create_time)
TTL create_time + INTERVAL 1 MONTH
SETTINGS index_granularity=8192;CollapsingMergeTree
Uses a Sign column to mark insert (+1) and delete (‑1) rows; compaction folds opposite signs. Queries must rewrite aggregates as sum(Sign) and sum(col * Sign).
VersionedCollapsingMergeTree
Adds a Version column to handle out‑of‑order inserts, allowing correct folding when Sign and Version match.
SummingMergeTree
Pre‑aggregates numeric columns on the primary key during compaction, reducing storage and speeding up queries.
AggregatingMergeTree
Supports arbitrary aggregate functions via AggregateFunction types and materialized views; requires …State on insert and …Merge on select.
ClickHouse’s diverse engines let users choose the optimal storage model for different workloads, while advanced engines like Replicated and Distributed are covered in future articles.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
