Databases 32 min read

Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

This article presents TCHouse‑C, a cloud‑native ClickHouse service, detailing its real‑time data update architecture, schema‑less ingestion, various update strategies such as Delete‑Insert and lightweight‑update/delete, and comprehensive performance tests comparing UniqueMergeTree with standard ClickHouse engines across import, query, and update workloads.

DataFunSummit
DataFunSummit
DataFunSummit
Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

TCHouse‑C is Tencent Cloud's fully managed ClickHouse service, currently serving many enterprise customers and over 90% of internal ClickHouse workloads.

The main content covers five parts: 1) Real‑time data update scenarios, 2) Existing ClickHouse data update methods, 3) ClickHouse Cloud's lightweight‑update/delete feature, 4) Industry solutions for real‑time updates in data warehouses, and 5) TCHouse‑C's own Delete‑Insert solution.

1. Real‑time data update scenarios focus on two key cases: high‑frequency CRUD operations (e.g., dashboards, IoT monitoring, user behavior tracking, e‑commerce transactions) requiring tens of thousands to hundreds of thousands QPS with low latency, and building wide tables using partial‑column updates (UPSERT) to simplify data integration.

2. Existing ClickHouse update schemes rely on ALTER TABLE … UPDATE/DELETE, which rewrite data parts asynchronously, causing high write cost and non‑real‑time visibility, especially under heavy mutation load.

3. ClickHouse Cloud lightweight‑update/delete stores update expressions in memory (Keeper) and applies them during query execution, providing real‑time visibility while still depending on the mutation mechanism, which may affect query performance and memory usage under heavy updates.

4. Industry real‑time update approaches include Copy‑On‑Write (high write cost, low query cost), Merge‑On‑Read (low write cost, high query cost), Delta‑Store (adds primary‑key index for efficient conflict handling), and Delete‑Insert (uses logical delete marks and bitmap indexing to achieve fast queries, especially in OLAP vectorized engines).

5. TCHouse‑C Delete‑Insert solution implements Upsert semantics at the SQL layer, row‑level indexes with bitmap delete marks, multi‑replica synchronization via ZooKeeper logs, versioning for conflict resolution, and a “tombstone” mechanism for delete propagation.

The solution also details data write and deduplication using row‑level indexes, virtual column exists_row for fast existence checks, and multi‑replica data consistency.

6. Performance testing compares UniqueMergeTree (TCHouse‑C) with ReplacingMergeTree across three scenarios: bulk import (SSB lineorder_fat, 600 M rows), concurrent updates (same dataset), and specific dataset updates (NY Taxi data, 600 M rows). Results show UniqueMergeTree offers superior import speed, several‑fold query speedup without FINAL, and markedly lower latency for concurrent updates, small‑scale updates, and high‑concurrency deletes.

7. Future roadmap includes optimizing in‑memory hash indexes, enhancing point‑lookup for market data, extending indexing to cold data, and improving query engine efficiency.

The article also discusses schema‑less handling of semi‑structured JSON data, automatic schema expansion, query rewriting for fuzzy field/path searches, and the benefits of this approach in terms of performance, cost, and operational simplicity.

Q&A sections address primary‑key index scope, Copy‑On‑Write vs. Merge‑On‑Read trade‑offs, row‑level index storage overhead, partition‑level indexing strategies, concurrency handling, and JSON type evolution handling.

Overall, TCHouse‑C provides a robust, high‑performance, and cost‑effective solution for real‑time data updates and semi‑structured data processing in large‑scale analytical workloads.

performance testingClickHouseData Warehousereal-time updatesSchema-lessDelete-InsertUniqueMergeTree
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.