Databases 8 min read

ClickHouse Data Recovery Procedure for a Failed Disk in a 4‑Shard 3‑Replica Cluster

This article details a step‑by‑step recovery of a ClickHouse 4‑shard, 3‑replica cluster after a node’s disks failed, covering verification of residual data, Zookeeper metadata cleanup, table reconstruction, distributed table restoration, and validation of synchronization across replicas.

Aikesheng Open Source Community

Jun 11, 2024

ClickHouse Data Recovery Procedure for a Failed Disk in a 4‑Shard 3‑Replica Cluster

The case describes a ClickHouse cluster with four shards and three replicas per shard, each node holding about 5.6 TB of data. One node became unusable after five disks failed, forcing a replacement of the data directory with slower mechanical disks.

Because the other two replicas of the same shard remained online, the business continued without impact, allowing recovery to start from those replicas. The recovery principle relies on Zookeeper storing the parts paths for each replica; missing parts are fetched from other replicas via the 9009 port.

Step 1: Verify no residual data on the failed node. Commands such as clickhouse-client --password and show databases; confirmed an empty data directory.

Step 2: Identify the replica on the current node. Querying system.clusters and checking the is_local flag determines the active replica.

Step 3: Clean Zookeeper metadata. A SQL script ( clear_zk.sql) is created and executed with

clickhouse-client --password --port 9000 --multiquery < clear_zk.sql

to remove stale metadata that would otherwise cause table‑creation conflicts.

Step 4: Export table structures. Using a query on system.tables joined with system.replicas, the create_table_query for each table is extracted and saved.

Step 5: Recreate databases and tables. Databases are recreated (e.g., create database xxx;), followed by generating an init_table.sql file containing the extracted create_table_query statements. The tables are then created with

clickhouse-client --password --port 9000 --multiquery < init_table.sql

Step 6: Verify data synchronization. After reconstruction, queries such as select * from xxxx limit 5; and df -h confirm that data is being synchronized at roughly 10 GB per minute.

Step 7: Restore distributed tables. The create_table_query for tables with the Distributed engine is exported, saved to init_d_table.sql, and executed to rebuild the distributed layer.

Step 8: Validate replica synchronization. Running

select database, table, replica_is_active from system.replicas;

on each replica confirms that all tables are fully synchronized.

Conclusion. The recovery highlights the importance of multi‑replica redundancy, understanding ClickHouse‑Zookeeper synchronization (including the use of port 9009 for parts transfer), cleaning stale Zookeeper metadata, and following a systematic procedure to rebuild tables and verify data integrity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Zookeeper Data Recovery Database operations

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.