Databases 8 min read

Understanding ClickHouse AggregatingMergeTree, AggregateFunction, and Materialized Views

This article explains how ClickHouse's AggregatingMergeTree engine uses the special AggregateFunction data type to pre‑aggregate data, demonstrates table creation, data insertion, and querying with state and merge functions, and shows how to combine it with materialized views for efficient analytics.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding ClickHouse AggregatingMergeTree, AggregateFunction, and Materialized Views

ClickHouse provides a special data type called AggregateFunction that stores intermediate aggregation states in binary form, enabling efficient pre‑aggregation during partition merges.

The article first introduces the AggregateFunction concept and shows a table definition using this type:

-- 建表语句
CREATE TABLE agg_table(
  id String,
  city String,
  code AggregateFunction(uniq,String),
  value AggregateFunction(sum,UInt32),
  create_time DateTime
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(create_time)
ORDER BY (id,city)
PRIMARY KEY id;

Two aggregate functions, uniq and sum, are specified. Data is written by calling the corresponding State functions (e.g., uniqState, sumState) and read by calling the matching Merge functions.

-- 写入测试数据
INSERT INTO TABLE agg_table SELECT 'A000','test',uniqState('code1'),sumState(toUInt32(100)),'2019-08-10 17:00:00';
INSERT INTO TABLE agg_table SELECT 'A001','test',uniqState('code2'),sumState(toUInt32(50)),'2019-08-10 17:00:00';

When querying, the raw AggregateFunction columns appear as binary data, so the Merge functions must be used:

SELECT id,city,uniqMerge(code),sumMerge(value) FROM agg_table GROUP BY id,city;

The result shows that the aggregation works as expected (e.g., uniqMerge(code) returns 1 for id =A000).

Because using AggregatingMergeTree directly can be cumbersome, the article recommends pairing it with a materialized view.

A materialized view is created on top of a regular MergeTree table, using AggregatingMergeTree as its engine:

CREATE TABLE agg_table_basic(
  id String,
  city String,
  code String,
  value UInt32
) ENGINE = MergeTree() PARTITION BY city ORDER BY (id,city);

CREATE MATERIALIZED VIEW agg_view ENGINE = AggregatingMergeTree()
PARTITION BY city ORDER BY (id,city) AS
SELECT id, city,
       uniqState(code) AS code,
       sumState(value) AS value
FROM agg_table_basic GROUP BY id, city;

Data is inserted into the base MergeTree table; the view automatically aggregates it according to the defined functions.

INSERT INTO TABLE agg_table_basic VALUES
  ('A000','wuhan','code1',100),
  ('A000','wuhan','code2',200),
  ('A000','zhuhai','code1',200);

SELECT id, sumMerge(value), uniqMerge(code) FROM agg_view GROUP BY id, city;

The article then lists the processing logic of AggregatingMergeTree in an ordered list, covering key ordering, use of AggregateFunction, partition‑level aggregation, and the need to call State on write and Merge on read.

Finally, it explains the principle that the engine performs pre‑computation during partition merges, which yields high query performance on massive datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLClickHouseMaterializedViewAggregateFunctionAggregatingMergeTree
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.