Databases 8 min read

ClickHouse Data Recovery Procedure for a Failed Disk in a 4‑Shard 3‑Replica Cluster

This article details a step‑by‑step recovery of a ClickHouse 4‑shard, 3‑replica cluster after a node’s disks failed, covering verification of residual data, Zookeeper metadata cleanup, table reconstruction, distributed table restoration, and validation of synchronization across replicas.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
ClickHouse Data Recovery Procedure for a Failed Disk in a 4‑Shard 3‑Replica Cluster

The case describes a ClickHouse cluster with four shards and three replicas per shard, each node holding about 5.6 TB of data. One node became unusable after five disks failed, forcing a replacement of the data directory with slower mechanical disks.

Because the other two replicas of the same shard remained online, the business continued without impact, allowing recovery to start from those replicas. The recovery principle relies on Zookeeper storing the parts paths for each replica; missing parts are fetched from other replicas via the 9009 port.

Step 1: Verify no residual data on the failed node. Commands such as clickhouse-client --password and show databases; confirmed an empty data directory.

Step 2: Identify the replica on the current node. Querying system.clusters and checking the is_local flag determines the active replica.

Step 3: Clean Zookeeper metadata. A SQL script ( clear_zk.sql ) is created and executed with clickhouse-client --password --port 9000 --multiquery < clear_zk.sql to remove stale metadata that would otherwise cause table‑creation conflicts.

Step 4: Export table structures. Using a query on system.tables joined with system.replicas , the create_table_query for each table is extracted and saved.

Step 5: Recreate databases and tables. Databases are recreated (e.g., create database xxx; ), followed by generating an init_table.sql file containing the extracted create_table_query statements. The tables are then created with clickhouse-client --password --port 9000 --multiquery < init_table.sql .

Step 6: Verify data synchronization. After reconstruction, queries such as select * from xxxx limit 5; and df -h confirm that data is being synchronized at roughly 10 GB per minute.

Step 7: Restore distributed tables. The create_table_query for tables with the Distributed engine is exported, saved to init_d_table.sql , and executed to rebuild the distributed layer.

Step 8: Validate replica synchronization. Running select database, table, replica_is_active from system.replicas; on each replica confirms that all tables are fully synchronized.

Conclusion. The recovery highlights the importance of multi‑replica redundancy, understanding ClickHouse‑Zookeeper synchronization (including the use of port 9009 for parts transfer), cleaning stale Zookeeper metadata, and following a systematic procedure to rebuild tables and verify data integrity.

Distributed SystemsZooKeeperClickHouseData RecoveryDatabase Operations
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.