Mastering Multi‑AZ Replication in HDFS with AZ Mover
This article introduces AZ Mover, a lightweight HDFS client‑side tool that intelligently scans, schedules, and migrates block replicas across multiple availability zones, detailing its design goals, core workflow, command‑line options, concurrency controls, and future enhancements for robust big‑data disaster recovery.
AZ Mover Overview
Following the previous article on HDFS multi‑AZ disaster‑recovery, this piece presents AZ Mover, a self‑developed tool that optimizes replica distribution across multiple availability zones (AZ) without interrupting services.
Design Goals
Intelligent replica distribution detection : precisely identify blocks needing governance.
Minimize business impact : rate‑limited, asynchronous, fault‑tolerant operations that do not affect normal reads/writes.
Flexible configuration and granularity : support governance by path, directory, user, or time range.
Continuous execution : can run periodically for progressive optimization.
AZ Mover functions as a “data health check + governance executor” with low intrusion cost.
Core Workflow
1. Scan and Identify Phase
The tool traverses the specified paths, retrieves each block’s replica list, reads the DataNode’s network topology, extracts the AZ information, and checks whether the block meets the multi‑AZ requirement (typically at least two different AZs). Blocks that lack sufficient replicas, are in recovery, or are temporary files are skipped. This phase requires the topology script defined by topology.script.file.name to be enabled.
2. Target Scheduling and Migration Strategy
For blocks that need governance, AZ Mover selects a target AZ not currently covered, picks a low‑load DataNode in that AZ, and chooses a source replica from an over‑represented AZ with relatively low load. It then calls the existing DataNode replaceBlock() interface to copy the replica to the target node and delete it from the source node. The migration is lock‑free, asynchronous, and runs in the background, leaving the original file untouched.
2.1 Block Scheduling Policy
The tool compares the AZ policy with actual replica distribution to compute misMatch (replicas that violate the policy) and target (desired AZ for each replica). It maintains a load coefficient for each DataNode (5‑minute average load per CPU core) and applies upper/lower thresholds to avoid overloading nodes.
2.2 Block Scheduling Execution
AZ Mover reuses the Balancer’s dispatcher module and invokes replaceBlock() to perform the actual block move. The replaceBlock() API is the core mechanism for block migration in HDFS and can be combined with a delHint to indicate which replica should be removed.
Concurrency Control and Fault Tolerance
Migration rate control : limit the number of block replacements per minute.
Dry‑run mode : preview the plan without executing it.
Blacklist mechanism : exclude specific nodes or directories.
Task state recording : log governance actions and support retries.
Interruptible execution : tasks can be safely paused or stopped.
Usage and Configuration
AZ Mover can be run from the command line, scheduled as a background job, or integrated into cluster scripts. Key parameters include:
--path : target directory (recursive).
--targetAZCount : desired number of AZs (recommended 3).
--threads : number of concurrent threads.
--rateLimit : maximum blocks processed per minute.
--dryRun : preview only.
--excludeNodes : list of DataNodes to exclude.
--retryFailed : retry previously failed tasks.
Recommended practices are periodic execution (daily/weekly), integration before high‑traffic events, inclusion in cold‑data lifecycle management, and combined use with the HDFS Balancer for both capacity and distribution governance.
Future Outlook
Smarter replica selection and migration strategies.
Block hotness awareness for hot‑cold data separation.
Visual governance dashboard for real‑time monitoring.
Integration with Yarn, Hive, and other ecosystem components.
Directory‑level policy discovery and automation.
Below is an illustration of the block migration process:
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.