Understanding HDFS Read and Write Mechanisms
This article explains how HDFS handles file reading and writing, detailing the roles of DFSClient, block selection, hedged reads, packet construction, checksum handling, and the interaction with NameNode and DataNode pipelines to ensure reliability and performance.
HDFS (Hadoop Distributed File System) differs from traditional single‑node file systems: it forbids concurrent writes and uses a lease mechanism to enforce exclusive access. The article walks through the core read and write processes, illustrating each step with code snippets and diagrams.
Read Mechanism : DistributedFileSystem.open(path) creates an FSDataInputStream that wraps a DFSInputStream . During construction, the client contacts the NameNode to obtain block metadata, which is sorted based on the client’s physical location. If the last block is not properly closed, limited retries are performed; otherwise the client retrieves the visible length of each replica via getReplicaVisibleLength(ExtendedBlock) . The client then selects a DataNode, creates a Peer , and builds a BlockReader that contains a Sender to fetch data. Reading can be sequential (using ByteArrayStrategy or ByteBufferStrategy ) or random (hedged reads). Hedged reads employ ExecutorCompletionService to mitigate slow DataNode responses.
Locate the block containing the current file position, connect to its DataNode, and instantiate a BlockReader with a BlockSender on the DataNode side.
The BlockReader fetches data chunks up to chunkSize and validates checksums.
If an IOException occurs (e.g., token or key errors), the offending DataNode is added to deadNodes and another node is tried.
Write Mechanism : Writing uses DFSOutputStream wrapped by FSDataOutputStream . Data is buffered into packets (default 64 KB) composed of chunks (512 bytes + 4‑byte checksum). The method computePacketChunkSize(int psize, int csize) calculates packet size, chunk size, and number of chunks per packet (see code below).
private void computePacketChunkSize(int psize, int csize) {
int chunkSize = csize + checksum.getChecksumSize();
chunksPerPacket = Math.max(psize/chunkSize, 1);
packetSize = chunkSize*chunksPerPacket;
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("computePacketChunkSize: src=" + src + ", chunkSize=" + chunkSize + ", chunksPerPacket=" + chunksPerPacket + ", packetSize=" + packetSize);
}
}The client maintains two internal queues: dataQueue for outgoing packets and ackQueue for acknowledgments, following a producer‑consumer pattern. When a block fills, the client requests a new block from the NameNode via addBlock , providing the previous block’s information to preserve order and optionally suggesting favored nodes.
Summary : The article outlines HDFS’s design for failure tolerance, showing how read/write pipelines, lease recovery, block recovery, and pipeline recovery work together to guarantee data durability and consistency even under network or node failures. Future posts will explore these recovery mechanisms in more depth.
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.