Understanding HDFS Architecture and Its Integration with NFS and Various Storage Solutions
This article reviews the fundamental concepts of HDFS, explains its master‑slave architecture with NameNode and DataNode, describes block replication, and discusses various implementations—including native HDFS, NetApp/Lustre, GPFS/Ceph, and Isilon—as well as HDFS‑to‑NFS gateway integration.
HDFS is a distributed file system that scales easily, making it ideal for storing large files and unstructured data; it offers high fault tolerance, runs on inexpensive hardware, and follows a write‑once, read‑many access pattern with high read performance.
First, the basic concepts of HDFS are reviewed, then the article discusses how HDFS and NFS can interoperate.
The native HDFS architecture follows a classic master‑slave model, consisting of a single NameNode (master) and multiple DataNode (slave) instances, each DataNode running a service that provides an application access interface.
Data in HDFS is stored in blocks; using multi‑replication, files are split into many blocks that are placed on different DataNodes, allowing concurrent reads from multiple nodes and improving read speed.
The NameNode manages the file system namespace, handles client file operations, and coordinates storage tasks, while DataNodes store the actual file data and maintain metadata such as block‑to‑DataNode mappings and node health information.
The read/write workflow of HDFS is illustrated in the accompanying diagram (see image above).
Various implementation approaches exist: native HDFS uses local disks, offering good data locality but low capacity utilization; other approaches include professional storage solutions, HDFS connector methods, and DFS client interfaces. The HDFS connector converts NFS requests into HDFS operations.
NetApp and Lustre solutions : Professional storage (e.g., NetApp FAS or E series) is used as external storage for NameNode and DataNode. The NameNode runs on the FAS series to store metadata, while actual data resides on the E series. This retains the standard three‑replica HDFS redundancy plus the storage’s own RAID, resulting in lower utilization but higher reliability.
GPFS and Ceph DFS solutions : These provide an HDFS connector that wraps Hadoop’s generic libraries, re‑implementing the HDFS interface in Java. The connector runs on the application host and translates calls to the underlying DFS protocol (typically NFS or GPFS). Storage capacity utilization and reliability depend on the underlying DFS, often employing erasure coding for higher efficiency.
Isilon DFS client solution : Implements full HDFS semantics, providing both NameNode and DataNode functionality within the Isilon system, offering excellent compatibility compared to the simpler HDFS connector approach.
HDFS can also be exposed as an NFS mount, allowing Linux or Windows clients to treat HDFS as a local file system. Since Hadoop 2.2.0, an NFS Gateway role can be added to the HDFS service; however, the gateway has limited random read/write performance and is best suited for file upload, download, and browsing.
The NFS gateway works by receiving requests from an NFS client, acting as an HDFS client to the cluster; the gateway can be deployed on a cluster node or on a separate proxy server. Hadoop provides command‑line tools to start the NFS gateway and monitor its processes on NameNode or DataNode machines.
Finally, the article includes a reminder to follow the public account for more content, but this is ancillary to the technical discussion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
