How UniqueMergeTree Boosts Real-Time Updates in ClickHouse Column Stores
UniqueMergeTree, a new ClickHouse table engine, addresses real‑time data update challenges by combining upsert semantics, unique key enforcement, and efficient delete‑bitmap handling, offering higher query performance at modest write cost, with detailed design, sharding strategies, conflict resolution, and performance evaluation.
UniqueMergeTree Development Background
Three typical scenarios require real‑time updates: (1) business needs to analyze transactional data such as orders in real time, requiring data streams to be synchronized to an OLAP database like ClickHouse; (2) real‑time synchronization of tables from a TP database to ClickHouse, needing support for updates and deletions; (3) deduplication of data streams where idempotent writes are required. All scenarios demand second‑ or minute‑level freshness and can be satisfied with a mini‑batch real‑time sync approach.
Common Column‑Store Real‑Time Update Solutions
Key‑Based Merge on Read
This approach is similar to LSM‑Tree. Data are sorted by key and written as column files with version numbers. Reads merge multiple versions to return the latest value for each key. ClickHouse’s ReplacingMergeTree and Doris use this scheme. It simplifies the write path but suffers from poor read performance due to single‑threaded merging, high memory copy cost, and limited predicate push‑down.
Mark‑Delete + Insert
Updates are expressed by marking rows for deletion with a bitmap and inserting new rows. The SQLServer column‑store example shows each RowGroup as an immutable column file with a DeleteBitmap. Queries filter out rows flagged in the bitmap. This method sacrifices write speed because it must locate keys and handle write‑write conflicts.
Variants
Both schemes can be enhanced with auxiliary indexes or buffering strategies to accelerate merges.
UniqueMergeTree Features
UniqueMergeTree introduces a UNIQUE KEY clause to enforce uniqueness. Writes follow upsert semantics: new keys are inserted, existing keys are updated. A virtual delete‑flag column enables real‑time row deletions. A version column resolves back‑fill conflicts, and the engine supports multi‑replica synchronization.
Distributed Table Write: Sharding Options
Two sharding strategies are available:
Internal sharding : ClickHouse’s distributed table automatically routes data based on a sharding key, providing transparent, consistent partitioning across tables. This is used in ByteHouse Cloud Data Warehouse.
External sharding : The client or SDK determines shard placement, reducing the number of small files in real‑time micro‑batches and improving write throughput, but it requires careful coordination by the user.
Single‑Node Read/Write Path
Write path: Determine the target part and row number for the incoming key, update the part’s delete bitmap to mark the old row, and write the new data to a new part. Each part maintains a key index for fast lookup and multiple delete files representing different bitmap versions.
Read path: Load the latest delete‑bitmap snapshots for all parts, then filter out rows marked as deleted during part reads, ensuring uniqueness.
Write‑Merge Conflict Handling
Two conflict types arise:
Write‑write conflict : Concurrent upserts on the same key may both mark the original row for deletion and write new rows, leading to duplicate keys. In AP scenarios, a simple table‑level lock serializes writes.
Write‑merge conflict : Ongoing background merges may see rows deleted by concurrent foreground writes, causing resurrected rows after merge. The solution adds a DeleteBuffer to each merge task, recording keys deleted during the merge. Before committing, the merge task incorporates these keys into the new part’s delete bitmap.
Performance Evaluation
YCSB benchmarks compare UniqueMergeTree with ReplacingMergeTree and the classic MergeTree. UniqueMergeTree’s write throughput drops by 40‑50% relative to ReplacingMergeTree, but query latency improves by an order of magnitude, matching the performance of the standard MergeTree. Gains stem from parallelized merges, in‑memory delete‑bitmap snapshots, direct skip of marked rows, and combined pre‑where and delete filters.
Conclusion and Future Plans
Since its launch in early 2020, UniqueMergeTree has been adopted by over 1,000 tables in production. Key decisions include sacrificing some write performance for substantially better read speed and avoiding strict data‑size limits on indexes. Future work will focus on partial‑column updates and further write‑throughput optimizations, such as finer‑grained table locks and disk‑based key indexes.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.