Databases 7 min read

Understanding ClickHouse Replication Mechanism

This article explains the ClickHouse replication mechanism, covering the Replication engine family, table‑level operation, Zookeeper dependency, data synchronization, insert quorum, and data consistency guarantees, providing practical guidance for configuring and using replicated MergeTree tables.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding ClickHouse Replication Mechanism

ClickHouse provides a Replication engine that works only with the MergeTree family, offering a set of engines such as ReplicatedMergeTree, ReplicatedSummingMergeTree, ReplicatedReplacingMergeTree, and others.

The replication mechanism operates at the table level, not at the database or node level; each node can host both replicated and non‑replicated tables. Replication does not depend on sharding, so tables with the same name on different shards are not synchronized.

DDL statements like CREATE, DROP, ATTACH, DETACH and RENAME do not trigger replication. When a CREATE TABLE with a Replication engine is executed on a node, a new replica is added if the table already exists elsewhere in the cluster.

Replication relies on Zookeeper (version 3.4.5+). Zookeeper stores metadata for each replica; if Zookeeper is not configured, Replication tables cannot be created and existing ones become read‑only. Queries on local replicated tables do not involve Zookeeper, while distributed replicated tables use Zookeeper‑controlled parameters such as max_replica_delay_for_distributed_queries and fallback_to_stale_replicas_for_distributed_queries to improve performance.

Data insertion is asynchronous and multi‑master: INSERT statements are sent to any node handling the query, then propagated to other replicas of the same shard. The background_schedule_pool_size setting controls the number of threads used for synchronization. By default, INSERT waits for the local replica; to require acknowledgment from all replicas, the insert_quorum option can be used.

INSERT operations are atomic at the block level: if the total rows are below max_insert_block_size, the insert is atomic. The system also ensures idempotency by discarding duplicate blocks, allowing safe retries of failed INSERTs.

Replication guarantees data consistency across replicas, with automatic monitoring and failover. When differences are minor, failover is automatic; larger discrepancies (e.g., schema changes) may require manual intervention.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ClickHouseReplicationMergeTree
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.