Databases 12 min read

Accelerating ClickHouse LowCardinality: Merge Optimizations & Auto Fallback

This article details how ByteDance’s ClickHouse UBA edition improves dictionary encoding for low‑cardinality columns by redesigning the Part‑merge process, introducing a single‑dictionary merge, and implementing an automatic fallback for high‑cardinality columns, resulting in significant storage savings and query‑performance gains across large‑scale applications.

ByteDance Data Platform

Jan 24, 2022

Accelerating ClickHouse LowCardinality: Merge Optimizations & Auto Fallback

ClickHouse UBA is a deeply customized version of the open‑source ClickHouse developed by ByteDance for the Volcano Engine growth‑analysis platform. It focuses on optimizing dictionary encoding for low‑cardinality columns.

Background

Although ClickHouse columnar storage already provides good compression, its disk usage can still be higher than Parquet for massive datasets, especially for low‑cardinality columns where Parquet’s dictionary encoding is more efficient. Many event‑attribute columns (city, gender, brand, etc.) have low cardinality, making dictionary encoding attractive.

ClickHouse offers the LowCardinality type for dictionary encoding, but two critical issues were observed in internal tests:

When many LowCardinality columns exist (average >300), Part‑merge becomes a bottleneck; merge speed cannot keep up with write speed, eventually causing cluster instability.

Dynamic Map columns used for user‑behavior attributes can produce very high‑cardinality values, which are unsuitable for dictionary encoding and degrade storage and compute performance.

To address these problems, a solution was needed that supports massive column dictionary encoding while maintaining fast Part merges and provides a fallback for high‑cardinality columns.

Solution

The optimization introduces a two‑stage merge process:

1‑) Dictionary Merge

During Part merge, dictionaries from all merging Parts are combined into a single dictionary. A conversion matrix records how each original index maps to the new dictionary.

2‑) Index Merge

Instead of rebuilding dictionaries for each Part, the stored index values are directly appended to the new Part using the conversion matrix. This eliminates the expensive hash‑table construction step.

The following image illustrates the internal LowCardinality storage structure and the new merge flow:

A fallback mechanism was added for high‑cardinality columns. When a column’s cardinality exceeds a threshold, the system automatically stores the column in its native format, bypassing dictionary encoding. This is implemented by wrapping the LowCardinality column with a versioned stream that indicates whether fallback has occurred.

Performance Results

Merge speed improvement (writes ~1 billion rows):

float64 (distinct 200): 6‑7 MiB/s → 37‑45 MiB/s

string (distinct 100 000): 6‑8 MiB/s → 12‑40.53 MiB/s

string (distinct 1 M): ~25 MiB/s → ~28 MiB/s

string (distinct 10 M): ~45 MiB/s → ~28 MiB/s (performance drop for very high cardinality)

When cardinality exceeds ~100 k, the benefit diminishes, and for >10 M distinct values the merge can even regress, highlighting the need for the fallback strategy.

Fallback performance compared to native columns (1 hundred million rows):

Merge speed: LowCardinality 28‑70 MiB/s → Fallback 125‑190 MiB/s → Native 200‑210 MiB/s

Disk size: LowCardinality 1063 MiB → Fallback 1013 MiB (similar to native)

Large‑scale application tests show substantial storage savings:

6 000+ columns, 400 M rows: LowCardinality 79 GB vs Native 115 GB (‑45 %)

6 000+ columns, 5 B rows: LowCardinality 829 GB vs Native 1248 GB (‑50 %)

4 000+ columns, 150 M rows: LowCardinality 17 GB vs Native 24 GB (‑41 %)

100+ columns, 140 M rows: LowCardinality 6.5 GB vs Native 7.7 GB (‑18 %)

Query performance tests on ten typical business SQLs show that most queries run faster on LowCardinality tables, with only two queries experiencing regressions due to high‑cardinality columns that were automatically fallen back.

Disk I/O and memory usage also decreased for the majority of queries, although fallback queries matched native I/O levels and LowCardinality queries consumed more memory due to per‑Part dictionary loading.

Conclusion

The ClickHouse UBA edition now fully adopts dictionary‑encoded LowCardinality columns, delivering up to 50 % storage reduction and notable query‑performance improvements in large‑scale growth‑analysis workloads. Remaining challenges include higher memory consumption during query execution and the inability to compute directly on compressed dictionaries. Future work will focus on sharing dictionaries across Parts and enabling computation in the compressed domain.

References

https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf

https://clickhouse.com/docs/en/sql-reference/data-types/lowcardinality/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization ClickHouse Dictionary Encoding LowCardinality

Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.