Understanding ClickHouse Distributed Tables, Replication, and Sharding
This article explains the concepts of ClickHouse local and distributed tables, why writing directly to distributed tables can be problematic, and how replication, sharding, and the ReplicatedMergeTree engine work together with ZooKeeper to provide high‑availability and scalable query processing.
ClickHouse distinguishes between two kinds of tables: distributed tables , which act as logical views routing queries to underlying local tables, and local tables , which actually store the data.
1. Reasons to Avoid Writing Directly to Distributed Tables
Data is split into many parts and forwarded to other servers, increasing network traffic and merge workload, which slows writes and raises the risk of "Too many parts".
Write consistency is weak: data is first persisted on the node hosting the distributed table, then asynchronously sent to the node with the local table, so failures can cause data loss.
Writes are asynchronous, leading to temporary inconsistencies.
Heavy load on ZooKeeper for coordination.
2. Replication & Sharding
ClickHouse uses the ReplicatedMergeTree engine family together with ZooKeeper to implement table replication, providing high availability. It also supports sharding (data partitioning) similar to Elasticsearch, allowing parallel reads and writes across multiple physical nodes.
3. Replicated Table & ReplicatedMergeTree Engines
Replication is defined at the table level. When creating a table, you can decide whether it should be highly available. Example of creating a replicated local table:
CREATE TABLE IF NOT EXISTS {local_table} ({columns})
ENGINE = ReplicatedMergeTree('/clickhouse/tables/#_tenant_id_#/#__appname__#/#_at_date_#/{shard}/hits', '{replica}')
partition by toString(_at_date_)
sample by intHash64(toInt64(toDateTime(_at_timestamp_)))
order by (_at_date_, _at_timestamp_, intHash64(toInt64(toDateTime(_at_timestamp_))))The engine requires two parameters: the ZooKeeper path where table metadata is stored (e.g., /clickhouse/tables/{shard}/[database_name]/[table_name]) and the replica name (commonly {replica}).
ZooKeeper stores extensive metadata, operation logs, replica states, and checksums, and it also acts as a metadata store, log service, and distributed coordination service, which can become a bottleneck as the cluster grows.
3.1 Data Synchronization Process
Write to one node.
Synchronize to other instances via the inter‑server HTTP port.
Update ZooKeeper records.
3.2 Problems Caused by Heavy ZooKeeper Dependence
When data volume and node count increase, the amount of information stored in ZooKeeper grows linearly, leading to instability. Community solutions include a "mini‑checksum" approach and moving most metadata out of ZooKeeper, leaving it only for coordination, block ID allocation, and sequence numbers.
4. Distributed Table & Distributed Engine
A distributed table is essentially a distributed view over multiple local physical tables (shards) and does not store data itself. It is created with the Distributed engine:
CREATE TABLE IF NOT EXISTS {distributed_table} AS {local_table}
ENGINE = Distributed({cluster}, '{local_database}', '{local_table}', rand())The engine requires the cluster identifier, the database name of the local table, the local table name, and optionally a sharding key (e.g., rand() or a hash of a column) to determine data routing.
4.1 Query Flow
Each instance exchanges data from its shards.
The results are aggregated on a single instance and returned to the user.
Understanding these mechanisms helps design efficient ClickHouse deployments and avoid common pitfalls related to replication lag, ZooKeeper overload, and write performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
