Understanding ClickHouse MergeTree Partitioning and Merging Rules
This article explains how ClickHouse's MergeTree engine creates partition directories, the naming convention for partitions, and the automatic and manual merging processes that consolidate parts while tracking block numbers and merge levels.
Introduction
Previously we introduced ClickHouse, its features, architecture, and use cases. This article dives into ClickHouse's partition concept and the rules for partition merging in the MergeTree engine.
MergeTree Partition Rules
Creating a table partitioned by month:
CREATE TABLE tab_partition(`dt` Date, `v` UInt8) ENGINE = MergeTree PARTITION BY toYYYYMM(dt) ORDER BY v;
INSERT INTO tab_partition(dt,v) VALUES ('2020-02-11',1),('2020-02-13',2);
INSERT INTO tab_partition(dt,v) VALUES ('2020-04-11',3),('2020-04-13',4);
INSERT INTO tab_partition(dt,v) VALUES ('2020-09-11',5),('2020-09-10',6);
INSERT INTO tab_partition(dt,v) VALUES ('2020-10-12',7),('2020-10-09',8);
INSERT INTO tab_partition(dt,v) VALUES ('2020-02-14',9),('2020-02-15',10);
INSERT INTO tab_partition(dt,v) VALUES ('2020-02-11',23),('2020-02-13',45);After data is written, MergeTree creates partition directories named PartitionID_MinBlockNum_MaxBlockNum_Level, e.g., 202002_4_4_0, where 202002 is the partition ID, 4_4 are the minimum and maximum block numbers, and 0 is the current merge level.
Partition ID is derived from the partition‑key value. Four generation rules apply:
If no partition key is specified, the ID defaults to all . Integer key that cannot be formatted as YYYYMMDD is used as its string representation. Date type or integer convertible to YYYYMMDD is formatted as YYYYMMDD . Other types (e.g., Float, String) use a 128‑bit hash of the inserted value.
MinBlockNum and MaxBlockNum are auto‑incrementing block numbers that start equal for a newly created part. After merging, MinBlockNum becomes the smallest among merged parts and MaxBlockNum the largest.
Level indicates how many times a partition has been merged; it starts at 0 and increments per merge for each partition independently.
MergeTree Partition Merging Rules
As data accumulates, many partition directories appear. ClickHouse merges parts belonging to the same partition automatically (default every 10‑15 minutes) or manually via an OPTIMIZE TABLE statement; old parts are later removed by background tasks.
Example of existing directories:
# ls
202002_1_1_0 202004_2_2_0 202009_3_3_0
202002_4_4_0 202002_5_5_0Manual merge command:
OPTIMIZE TABLE tab_partition;
SELECT partition, name, part_type, active FROM system.parts WHERE table='tab_partition';In the query result, active=1 marks the latest merged part, while active=0 marks old parts that will be filtered out automatically.
After merging partition 202002, the new directory follows the rule: same PartitionID, MinBlockNum = smallest MinBlockNum among merged parts, MaxBlockNum = largest MaxBlockNum, Level = max Level + 1.
Resulting directory structure is illustrated below:
Conclusion
This note, part of “ClickHouse Principles and Practice”, demonstrates how MergeTree partitions are generated, named, and merged, and encourages readers to try the process themselves.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
