Databases 14 min read

Understanding ClickHouse Block and LSM: Batch Processing, Pre‑sorting, and Compression

The article explains how ClickHouse uses block‑based batch processing combined with LSM‑style pre‑sorting and columnar compression to accelerate range queries on massive datasets, while also discussing the trade‑offs such as write latency and limitations for transactional workloads.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding ClickHouse Block and LSM: Batch Processing, Pre‑sorting, and Compression

Part 1: Block + LSM

ClickHouse implements batch processing through "blocks" (default up to 8192 rows) and achieves pre‑sorting using an LSM‑like algorithm, which together improve query speed for large‑scale analytical workloads.

Ordered storage allows range queries to be satisfied with a single disk I/O, whereas unordered storage may require many I/Os. For a value query on a column with an index, both ordered and unordered storage typically need only one I/O; for a range query, ordered storage still needs one I/O, while unordered storage may need up to N I/Os.

Example query: SELECT avg(price) FROM orders where age between 20 and 30; If the "age" column is stored ordered, reading the relevant rows (10% of 100 million rows ≈ 10 GB) requires a single sequential scan. If unordered, the same query may read roughly 29 GB (≈3× more) due to page‑level random accesses.

Beyond pre‑sorting, ClickHouse’s block processing shines in compression. Using the official compressor tool on the UserID column of the hits_v1 dataset, the best block compressed from 130 272 bytes to 639 bytes (≈203×), and overall the column compressed from 70.99 MB to 11.60 MB (≈6.2×).

These gains stem from columnar storage, where each column’s data exhibits regular patterns that compress efficiently.

Consequently, batch processing + pre‑sorting reduces disk I/O by roughly 6× (compression) and can cut range‑query time by up to 18× for 10% of data scanned, or 24× for 1% of data in a hundred‑billion‑row table.

However, the design has drawbacks:

It favors bulk writes; frequent small writes degrade performance.

Range‑query benefits diminish for small datasets.

Block‑level granularity makes single‑row deletions slow.

Updates, especially on sorted columns, are costly, making ClickHouse unsuitable for OLTP workloads.

Appendix

Unordered storage appears to read 4 KB per page because operating systems read data in page‑size units (default 4 KB). The 27.1 % figure in the illustration represents cache‑hit rate, which inversely correlates with the proportion of data accessed.

Part 2: LSM Algorithm in ClickHouse

The LSM algorithm, first described in a 1991 ACM paper, is widely used in big‑data storage systems (LevelDB, HBase, Cassandra). ClickHouse adopts a variant to achieve pre‑sorting.

When inserting unordered data, ClickHouse logs the batch, sorts it in memory, and writes the sorted block to disk (level 0). Periodically, overlapping level‑0 files are merged into higher levels, eventually becoming immutable.

Timeline example:

T=0: database empty.

T=1: 500‑row insert – logged, sorted, written as L0.

T=2: 800‑row insert – logged, sorted, written as L0.

T=3: Merge L0 files → new L1, old L0 files marked for deletion.

T=4: Physical deletion of marked files.

T=5: 100‑row insert – logged, sorted, written as L0.

T=6: No overlap between L0 and L1, so L0 is promoted to L1.

This process converts random writes into sequential writes, a hallmark of LSM, though ClickHouse does not exploit all LSM benefits (e.g., it does not keep multiple levels for compaction).

Compared with LevelDB, which buffers writes in memory until a threshold is reached before flushing, ClickHouse sorts immediately after logging, favoring read‑heavy analytical scenarios, while LevelDB targets write‑heavy workloads.

Other Remarks

The choice of storage engine must match workload characteristics. ClickHouse excels at read‑intensive analytics with massive batch writes, whereas LevelDB suits write‑intensive use cases. Architects should evaluate trade‑offs rather than applying LSM universally.

The article aims to share the engineering insights behind ClickHouse’s design, illustrating how existing algorithms can be adapted to achieve extreme performance in specific domains.

big dataClickHouseCompressionColumnar DatabaseLSMBlock
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.