Introduction to HDFS: Architecture, Features, Replication, Rack Awareness, and Metadata Management
This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), covering its streaming data access model, key characteristics, master‑slave architecture, block storage and replication mechanisms, rack‑aware placement strategy, and how the NameNode manages metadata and checkpoints.
HDFS (Hadoop Distributed File System) is a highly fault‑tolerant system designed for high‑throughput data access on large‑scale datasets, storing files as streams of blocks across inexpensive commodity hardware.
Key characteristics include streaming data access, suitability for massive files (GB‑TB), write‑once‑read‑many semantics, single‑writer restriction, and a focus on high throughput rather than low‑latency access.
The architecture follows a Master/Slave model with a NameNode managing the namespace and client operations, and multiple DataNodes storing the actual data blocks. The NameNode also tracks block‑to‑DataNode mappings.
Files are split into fixed‑size blocks (except the last block) and each block is replicated for fault tolerance. Replication factor is configurable per file, and all writes are append‑only, simplifying consistency.
HDFS employs a rack‑aware placement policy: typically one replica on the local rack, another on a different node within the same rack, and a third on a node in a different rack. This improves reliability, balances load, and reduces cross‑rack traffic during reads.
Metadata is stored on the NameNode using an EditLog for transactional changes and an FsImage for the complete namespace snapshot. Periodic checkpoints merge the EditLog into a new FsImage, after which the old log is discarded.
In summary, the article introduces HDFS fundamentals, its architecture, replication strategy, rack awareness, and metadata management, providing a solid foundation for further exploration of Hadoop’s design.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
