Big Data 5 min read

Understanding HDFS Architecture: Key Components, Protocols, and Limitations

This article explains HDFS’s master‑slave architecture, detailing the roles of NameNode and DataNode, namespace management, communication protocols, client functions, common configuration parameters, maintenance commands, and the inherent limitations of a single‑NameNode design.

Programmer DD

Apr 14, 2021

HDFS Architecture

Overview

HDFS uses a master/slave model consisting of a single NameNode and multiple DataNodes. The NameNode manages the file system namespace and client access, while each DataNode runs a process that handles read/write requests, creates, deletes, and replicates data blocks, storing data on the local Linux file system.

Namespace Management

HDFS namespace includes directories, files, and blocks.

In HDFS 1.0 there is only one namespace and one NameNode that manages it.

HDFS follows a hierarchical file system, allowing users to create, delete, move, and rename directories and files just like a regular file system.

Communication Protocol

All data transfers occur over the network because HDFS is a distributed file system.

Protocols are built on top of TCP/IP.

Clients initiate TCP connections to the NameNode on a configurable port and interact via the client protocol.

NameNode and DataNode communicate using the DataNode protocol.

Client‑DataNode interaction uses RPC; the NameNode only responds to RPC requests, it does not initiate them.

Client

The client is the most common way users interact with HDFS; a client library is provided with the deployment.

The HDFS client exposes a file system interface that abstracts most implementation complexities.

Strictly speaking, the client is not part of HDFS itself.

It supports operations such as open, read, write, and provides a shell‑like command line for data access.

HDFS also offers a Java API for programmatic access.

Limitations of HDFS Architecture

Having a single NameNode simplifies design but introduces several clear limitations:

Namespace limitation: the NameNode stores metadata in memory, so the number of objects it can manage is bounded by available RAM.

Performance bottleneck: overall throughput is constrained by the single NameNode.

Isolation issue: a single namespace prevents isolation of different applications.

Cluster availability: failure of the sole NameNode renders the entire cluster unavailable.

Common HDFS Configuration Parameters

Common HDFS Maintenance Commands

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Configuration HDFS NameNode DataNode

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.