Databases 25 min read

ClickHouse Engines: Use Cases, Syntax, and Limitations

This article provides a comprehensive overview of ClickHouse, covering its typical application scenarios, inherent limitations, common SQL syntax, default values, data types, materialized and expression columns, and detailed explanations of its various storage engines such as TinyLog, Log, Memory, Merge, Distributed, Null, Buffer, Set, MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree, and CollapsingMergeTree, accompanied by practical code examples.

Big Data Technology & Architecture

Nov 13, 2019

ClickHouse Engines: Use Cases, Syntax, and Limitations

ClickHouse is a column‑oriented database designed for big‑data workloads. It excels in read‑heavy scenarios, batch updates, wide tables, low‑latency queries, and environments where transactions and strong consistency are not required.

Typical application scenarios:

Most requests are read‑only.

Data is updated in large batches (over 1,000 rows) rather than row‑by‑row.

Data is only appended; updates are rarely needed.

Queries read many rows but use only a few columns.

Tables are "wide" with many columns.

Query frequency is relatively low (hundreds of queries per second per server).

Simple queries tolerate ~50 ms latency.

Column values are small numbers or short strings.

High throughput is required for single‑query processing (billions of rows per second per server).

No transaction support is needed.

Data consistency requirements are low.

Each query touches one large table and many small tables.

Result sets are much smaller than the source data, often after filtering or aggregation.

ClickHouse limitations:

No true DELETE/UPDATE support; transactions are not available (future versions may add them).

No secondary indexes.

SQL support is limited; JOIN implementation differs from standard databases.

Window functions are not supported.

Metadata management requires manual intervention.

Common SQL Syntax

-- List databases
show databases;

-- List tables in the current database
show tables;

-- Create a database
create database test;

-- Drop a table if it exists
drop table if exists test.t1;

-- Create a table (temporary, if not exists)
create table test.m1 (
  id UInt16,
  name String
) ENGINE = Memory;

-- Insert test data
insert into test.m1 (id, name) values (1, 'abc'), (2, 'bbbb');

-- Query the table
select * from test.m1;

Default Values

ClickHouse always provides a default value for a column when none is specified: numeric types default to 0, strings to an empty string, arrays to an empty array, dates to 0000-00-00, and times to 0000-00-00 00:00:00. NULLs are not supported.

Data Types

1. Integer types: UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64. 2. Enum types: Enum8, Enum16 (e.g., Enum('hello'=1,'test'=-1) ). 3. String types: FixedString(N) (fixed byte length) and String (variable length, replaces VARCHAR/BLOB). 4. Date type. 5. Array type: Array(T) where T is a basic type (multidimensional arrays are discouraged). 6. Tuple type. 7. Nested type: Nested(name1 Type1, name2 Type2, ...), similar to a map.

Materialized Columns

A column defined with MATERIALIZED is computed from an expression and cannot be set via INSERT. It also does not appear in SELECT * results.

drop table if exists test.m2;
create table test.m2 (
  a MATERIALIZED (b + 1),
  b UInt16
) ENGINE = Memory;
insert into test.m2 (b) values (1);
select * from test.m2;
select a, b from test.m2;

Expression (Alias) Columns

An ALIAS column behaves like a materialized column but its value is not stored; it is recomputed on each query.

create table test.m3 (a ALIAS (b + 1), b UInt16) ENGINE = Memory;
insert into test.m3(b) values (1);
select * from test.m3;
select a, b from test.m3;

Engines

TinyLog

The simplest engine stores each column in a separate compressed file and does not support indexes or concurrent writes. It is suitable for write‑once, read‑many scenarios with up to about 1 million rows.

drop table if exists test.tinylog;
create table test.tinylog (a UInt16, b UInt16) ENGINE = TinyLog;
insert into test.tinylog(a,b) values (7,13);

Log

Similar to TinyLog but adds a __marks.mrk file that records block offsets, enabling concurrent reads. Writes are still exclusive.

drop table if exists test.log;
create table test.log (a UInt16, b UInt16) ENGINE = Log;
insert into test.log(a,b) values (7,13);

Memory

Data is kept uncompressed in RAM; it disappears after a server restart. Reads are fast, but no indexes are supported.

Merge

A virtual engine that concatenates multiple tables from the same database, allowing concurrent reads while delegating indexing to the source tables. It does not support writes.

create table test.tinylog1 (id UInt16, name String) ENGINE = TinyLog;
create table test.tinylog2 (id UInt16, name String) ENGINE = TinyLog;
create table test.tinylog3 (id UInt16, name String) ENGINE = TinyLog;
insert into test.tinylog1(id, name) values (1, 'tinylog1');
insert into test.tinylog2(id, name) values (2, 'tinylog2');
insert into test.tinylog3(id, name) values (3, 'tinylog3');
use test;
create table test.merge (id UInt16, name String) ENGINE = Merge(currentDatabase(), '^tinylog[0-9]+');
select _table, * from test.merge order by id desc;

Distributed

Provides a logical table that forwards queries to physical tables on multiple shards. The syntax is Distributed(remote_group, database, table [, sharding_key]). Configuration is defined in /etc/clickhouse-server/config.xml under remote_servers.

create table test.tinylog_d (id UInt16, name String) ENGINE = Distributed(log, test, tinylog_d1, id);
insert into test.tinylog_d(id, name) values (0,'main'), (1,'main'), (2,'main');
select name, sum(id), count(id) from test.tinylog_d group by name;

Null

All inserted data is discarded; queries always return an empty result set, but the table schema is still enforced.

Buffer

Acts as an in‑memory staging area for another table. Data is flushed to the target table when size, row count, or time thresholds are reached. It inherits the same limitations as the Memory engine.

create table test.mergetree (sdt Date, id UInt16, name String, point UInt16) ENGINE = MergeTree(sdt, (id, name), 10);
create table test.mergetree_buffer as test.mergetree ENGINE = Buffer(test, mergetree, 16, 3, 20, 2, 10, 1, 10000);
insert into test.mergetree (sdt, id, name, point) values ('2017-07-10',1,'a',20);
insert into test.mergetree_buffer (sdt, id, name, point) values ('2017-07-10',1,'b',10);
select * from test.mergetree;
select * from test.mergetree_buffer;

Set

An in‑memory engine used only on the right side of an IN clause. Data is not persisted across restarts unless a materialized view is created.

create table test.set (id UInt16, name String) ENGINE = Set;
insert into test.set(id, name) values (1, 'hello');
select 1 where (toUInt16(1), 'hello') in test.set;

MergeTree

The core engine of ClickHouse, supporting a primary key composed of a date column and one or more additional columns, customizable index granularity, and sampling.

create table test.mergetree1 (sdt Date, id UInt16, name String, cnt UInt16) ENGINE = MergeTree(sdt, (id, name), 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-01',1,'aaa',10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-02',4,'bbb',10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-03',5,'ccc',11);

ReplacingMergeTree

Extends MergeTree by adding a version column to resolve duplicate rows during merges, keeping only the newest version.

create table test.replacingmergetree (sdt Date, id UInt16, name String, cnt UInt16) ENGINE = ReplacingMergeTree(sdt, (name), 10, cnt);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10',1,'a',20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10',1,'a',30);
optimize table test.replacingmergetree;
select * from test.replacingmergetree;

SummingMergeTree

During merges, numeric columns listed in the engine definition are summed; non‑summable columns keep the first encountered value.

create table test.summingmergetree (sdt Date, name String, a UInt16, b UInt16) ENGINE = SummingMergeTree(sdt, (sdt, name), 8192, (a));
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10','a',1,20);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10','b',2,11);
optimize table test.summingmergetree;
select * from test.summingmergetree;

AggregatingMergeTree

Optimizes storage of pre‑aggregated data by storing aggregation states (e.g., uniqState) and merging them with uniqMerge during queries.

create table test.aggregatingmergetree (sdt Date, dim1 String, dim2 String, dim3 String, measure1 UInt64) ENGINE = MergeTree(sdt, (sdt, dim1, dim2, dim3), 8192);
create materialized view test.aggregatingmergetree_view ENGINE = AggregatingMergeTree(sdt,(dim2, dim3),8192) as
select sdt, dim2, dim3, uniqState(dim1) as uv from test.aggregatingmergetree group by sdt, dim2, dim3;
insert into test.aggregatingmergetree values ('2018-06-10','aaaa','a','10',1);
-- ...additional inserts...
optimize table test.aggregatingmergetree_view;
select dim2, uniqMerge(uv) from test.aggregatingmergetree_view group by dim2 order by dim2;

CollapsingMergeTree

Designed for OLAP scenarios where rows are never updated or deleted; a signed Int8 column indicates insertion (+1) or deletion (-1) so that merges can "collapse" opposite rows, yielding the final state.

create table test.collapsingmergetree(sign Int8, sdt Date, name String, cnt UInt16) ENGINE = CollapsingMergeTree(sdt, (sdt, name), 8192, sign);

Author: darebeat

Source: https://www.jianshu.com/p/a5bf490247ea

Feel free to like, bookmark, and share the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL Data Modeling ClickHouse MergeTree Database Engines

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.