Big Data 17 min read

Memory Architecture and Analysis of Hadoop HDFS NameNode

The article dissects Hadoop 2.4.1’s HDFS NameNode memory architecture, detailing how the Namespace, BlockManager, NetworkTopology, and LeaseManager consume the heap, exposing scaling problems when metadata reaches hundreds of millions of inodes and blocks, and recommending file merging, block‑size tuning, federation, or external KV stores to mitigate heap pressure.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Memory Architecture and Analysis of Hadoop HDFS NameNode

Overview: The NameNode is the most critical component of HDFS, managing metadata and becoming a single point of failure. This article examines the internal structure and operation of the NameNode based on Hadoop 2.4.1.

Memory Panorama: The NameNode memory is divided into four major parts—Namespace, BlocksMap, NetworkTopology, and others. Namespace and BlockManager each consume roughly 50% of the heap.

Namespace: Stores the directory tree and file‑to‑block mappings. Data is kept in memory and periodically flushed to FsImage. Two INode types (INodeDirectory, INodeFile) inherit from INode, with extensible feature fields.

BlockManager: Manages BlocksMap, which maps block IDs to BlockInfo objects. BlockInfo contains replica locations (triplets) and links to neighboring blocks. The BlocksMap is implemented with a LightWeightGSet hash table occupying a large portion of heap.

Dynamic structures: excessReplicateMap, neededReplications, invalidateBlocks, corruptReplicas, and the ReplicationMonitor thread coordinate replica adjustments and deletions.

NetworkTopology: Maintains rack‑aware topology and DataNode descriptors. Each DatanodeStorageInfo keeps a doubly‑linked list of its blocks to support fast insert/delete and sequential traversal during BlockReport.

LeaseManager: Implements the lease protocol for write‑once‑read‑many semantics. It tracks leases, sortedLeases, and mappings from clients and paths, handling soft and hard time‑outs.

Problems: As metadata scales (e.g., 2 × 10⁸ inodes, 3 × 10⁸ blocks), NameNode heap usage exceeds 90 GB, leading to long startup times, performance degradation, frequent Full GC pauses, and difficulty debugging large heap dumps.

Solutions & Recommendations: Merge small files, adjust block size, and adopt HDFS Federation or external KV stores (LevelDB, Baidu HDFS2, Taobao HDFS2) to offload metadata.

Conclusion: The NameNode’s memory layout is complex but essential for HDFS reliability. Understanding its core structures helps diagnose scaling bottlenecks and choose appropriate mitigation strategies.

References: Apache Hadoop documentation, source code, HDFS Federation design, LevelDB integration, Baidu and Taobao HDFS2 projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataMemory ManagementHDFSNameNode
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.