Big Data 7 min read

Understanding HDFS Read and Write Mechanisms

This article explains how HDFS handles file reading and writing, detailing the roles of DFSClient, block selection, hedged reads, packet construction, checksum handling, and the interaction with NameNode and DataNode pipelines to ensure reliability and performance.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Understanding HDFS Read and Write Mechanisms

HDFS (Hadoop Distributed File System) differs from traditional single‑node file systems: it forbids concurrent writes and uses a lease mechanism to enforce exclusive access. The article walks through the core read and write processes, illustrating each step with code snippets and diagrams.

Read Mechanism : DistributedFileSystem.open(path) creates an FSDataInputStream that wraps a DFSInputStream . During construction, the client contacts the NameNode to obtain block metadata, which is sorted based on the client’s physical location. If the last block is not properly closed, limited retries are performed; otherwise the client retrieves the visible length of each replica via getReplicaVisibleLength(ExtendedBlock) . The client then selects a DataNode, creates a Peer , and builds a BlockReader that contains a Sender to fetch data. Reading can be sequential (using ByteArrayStrategy or ByteBufferStrategy ) or random (hedged reads). Hedged reads employ ExecutorCompletionService to mitigate slow DataNode responses.

Locate the block containing the current file position, connect to its DataNode, and instantiate a BlockReader with a BlockSender on the DataNode side.

The BlockReader fetches data chunks up to chunkSize and validates checksums.

If an IOException occurs (e.g., token or key errors), the offending DataNode is added to deadNodes and another node is tried.

Write Mechanism : Writing uses DFSOutputStream wrapped by FSDataOutputStream . Data is buffered into packets (default 64 KB) composed of chunks (512 bytes + 4‑byte checksum). The method computePacketChunkSize(int psize, int csize) calculates packet size, chunk size, and number of chunks per packet (see code below).

private void computePacketChunkSize(int psize, int csize) {
  int chunkSize = csize + checksum.getChecksumSize();
  chunksPerPacket = Math.max(psize/chunkSize, 1);
  packetSize = chunkSize*chunksPerPacket;
  if (DFSClient.LOG.isDebugEnabled()) {
    DFSClient.LOG.debug("computePacketChunkSize: src=" + src + ", chunkSize=" + chunkSize + ", chunksPerPacket=" + chunksPerPacket + ", packetSize=" + packetSize);
  }
}

The client maintains two internal queues: dataQueue for outgoing packets and ackQueue for acknowledgments, following a producer‑consumer pattern. When a block fills, the client requests a new block from the NameNode via addBlock , providing the previous block’s information to preserve order and optionally suggesting favored nodes.

Summary : The article outlines HDFS’s design for failure tolerance, showing how read/write pipelines, lease recovery, block recovery, and pipeline recovery work together to guarantee data durability and consistency even under network or node failures. Future posts will explore these recovery mechanisms in more depth.

Big Datapacketdistributed file systemHDFSchecksumDFSClientRead Write
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.