HDFS DataNode Volume Choosing Policies: Round‑Robin and Available‑Space Strategies
This article explains how HDFS DataNode stores data blocks on local disks, detailing the configuration of storage directories, the two volume‑choosing policies (round‑robin and available‑space), their implementation via the VolumeChoosingPolicy interface, and the logic used to balance disk usage.
In HDFS, a DataNode stores each block in a local file‑system directory defined by the dfs.datanode.data.dir parameter in hdfs-site.xml. Typical deployments configure multiple directories on different devices (HDDs, SSDs) to spread the load.
When a new block is written, the DataNode selects a storage volume according to the policy set by dfs.datanode.fsdataset.volume.choosing.policy. Hadoop provides two built‑in policies: round‑robin and available‑space .
The selection logic is defined by the VolumeChoosingPolicy interface, which declares a single method:
public interface VolumeChoosingPolicy<V extends FsVolumeSpi> {
/**
* Choose a volume to place a replica, given a list of volumes and the replica size.
* @param volumes - a list of available volumes.
* @param replicaSize - the size of the replica for which a volume is sought.
* @return the chosen volume.
* @throws IOException when disks are unavailable or are full.
*/
V chooseVolume(List<V> volumes, long replicaSize) throws IOException;
}Round‑Robin Policy
The class RoundRobinVolumeChoosingPolicy implements the interface by cycling through the list of volumes. It keeps a curVolume index, checks each volume’s available space, and returns the first one that can accommodate the block. If no volume has enough space, it throws DiskOutOfSpaceException. The algorithm ensures every disk is used, but heavy deletions can cause uneven distribution.
public class RoundRobinVolumeChoosingPolicy<V extends FsVolumeSpi>
implements VolumeChoosingPolicy<V> {
private int curVolume = 0;
@Override
public synchronized V chooseVolume(final List<V> volumes, final long blockSize)
throws IOException {
if (volumes.size() < 1) {
throw new DiskOutOfSpaceException("No more available volumes");
}
// round‑robin selection
int startVolume = curVolume;
long maxAvailable = 0;
while (true) {
V volume = volumes.get(curVolume);
curVolume = (curVolume + 1) % volumes.size();
long available = volume.getAvailable();
if (available > blockSize) {
return volume;
}
if (available > maxAvailable) {
maxAvailable = available;
}
if (curVolume == startVolume) {
throw new DiskOutOfSpaceException(
"Out of space: " + maxAvailable + " B is less than block size " + blockSize + " B.");
}
}
}
}Available‑Space Policy
Introduced in Hadoop 2.1.0, this policy prefers the volume with the greatest free space. Internally it re‑uses the round‑robin policy for tie‑breaking. The implementation resides in AvailableSpaceVolumeChoosingPolicy and works as follows:
Build a list of volumes sorted by free space.
If all volumes are within a configured threshold ( balancedSpaceThreshold, default 10 GB), fall back to pure round‑robin.
Otherwise split volumes into highAvailableVolumes and lowAvailableVolumes based on the threshold.
If the replica size exceeds the maximum free space among low‑available volumes, select from high‑available volumes.
Otherwise, with a configurable preference fraction (default 75 %), randomly choose between high‑ and low‑available groups, then apply round‑robin within the chosen group.
public synchronized V chooseVolume(List<V> volumes, long replicaSize) throws IOException {
if (volumes.size() < 1) {
throw new DiskOutOfSpaceException("No more available volumes");
}
AvailableSpaceVolumeList volumesWithSpaces = new AvailableSpaceVolumeList(volumes);
if (volumesWithSpaces.areAllVolumesWithinFreeSpaceThreshold()) {
// use round‑robin directly
return roundRobinPolicyBalanced.chooseVolume(volumes, replicaSize);
}
List<V> highAvailableVolumes = extractVolumesFromPairs(
volumesWithSpaces.getVolumesWithHighAvailableSpace());
List<V> lowAvailableVolumes = extractVolumesFromPairs(
volumesWithSpaces.getVolumesWithLowAvailableSpace());
long mostAvailableAmongLow = volumesWithSpaces.getMostAvailableSpaceAmongVolumesWithLowAvailableSpace();
float scaledPreference = computeScaledPreference(highAvailableVolumes.size(), lowAvailableVolumes.size());
V volume;
if (mostAvailableAmongLow < replicaSize || RAND.nextFloat() < scaledPreference) {
volume = roundRobinPolicyHighAvailable.chooseVolume(highAvailableVolumes, replicaSize);
} else {
volume = roundRobinPolicyLowAvailable.chooseVolume(lowAvailableVolumes, replicaSize);
}
return volume;
}The method areAllVolumesWithinFreeSpaceThreshold compares the difference between the maximum and minimum free space of all volumes against the balancedSpaceThreshold (default 10 GB). If the difference is small, the simple round‑robin policy is sufficient.
When the free‑space distribution is highly skewed, the policy may still lead to imbalance. In long‑running clusters, large deletions or the addition of new disks can cause some disks to become hot spots. Hadoop 3.0 addresses this with a disk balancer that redistributes data across volumes.
Overall, the two policies provide a trade‑off between uniform disk utilization (round‑robin) and space‑aware placement (available‑space), with configurable thresholds and preference fractions to adapt to cluster characteristics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
