Databases 6 min read

Setting Up a ClickHouse Cluster with Replication and Distributed Tables

This tutorial walks through building a three‑node ClickHouse cluster, configuring host mappings and XML files, setting up ZooKeeper for internal replication, creating local and distributed tables, testing data insertion and queries, and demonstrating how to achieve high availability with replicated tables.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Setting Up a ClickHouse Cluster with Replication and Distributed Tables

In this article, senior DBA Deng Yanyun explains how to build a three‑node ClickHouse cluster, configure host mappings, create the necessary XML configuration files, and set up ZooKeeper for internal replication.

Each node runs ClickHouse and shares the same /etc/clickhouse-server/config.xml and /etc/metrika.xml . The metrika.xml defines <clickhouse_remote_servers> with three shards, each containing a single replica, and includes ZooKeeper server definitions and macro settings.

After restarting the three servers, the cluster is ready. The next steps are to create local tables on each node and a distributed table that references them. Example DDL statements are:

CREATE TABLE ontime_local (FlightDate Date, Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(perftest_3shards_1replicas, default, ontime_local, rand());

Data can be inserted into the distributed table from any node, e.g.:

INSERT INTO ontime_all (FlightDate, Year) VALUES ('2001-10-12', 2001);
INSERT INTO ontime_all (FlightDate, Year) VALUES ('2002-10-12', 2002);
INSERT INTO ontime_all (FlightDate, Year) VALUES ('2003-10-12', 2003);

Querying SELECT * FROM ontime_all; returns the three rows from all shards. The article also demonstrates that if one node is stopped, queries fail because the configuration uses three shards with no replicas, showing the lack of high availability.

To achieve HA, the author suggests configuring two nodes with two replicas, which provides fault tolerance and parallel distributed queries. Finally, a replicated table is created using ReplicatedMergeTree :

CREATE TABLE ontime_replica (FlightDate Date, Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse_perftest/tables/ontime_replica', '{replica}', FlightDate, (Year, FlightDate), 8192);

Data insertion into the replicated table is demonstrated with:

INSERT INTO ontime_replica (FlightDate, Year) VALUES ('2018-10-12', 2018);

References to the official ClickHouse documentation are provided for further reading.

SQLZookeeperClickHouseReplicationClusterDistributed Table
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.