Big Data 13 min read

Memory Usage Analysis of HDFS NameNode Core Data Structures

The article quantitatively breaks down HDFS NameNode memory consumption, showing that the Namespace tree and BlocksMap together dominate heap usage (≈53 GB in large clusters), provides detailed per‑object size estimates for NetworkTopology, INode and block structures, and proposes a simple formula to predict total heap requirements and tuning recommendations.

Meituan Technology Team

Dec 9, 2016

Memory Usage Analysis of HDFS NameNode Core Data Structures

This article extends the previous "HDFS NameNode Memory Panorama" by providing a detailed quantitative analysis of the memory consumption of the NameNode’s core data structures and proposing a memory estimation model.

Before scaling a NameNode horizontally, administrators face challenges such as continuously growing resident memory, determining when to resize the JVM heap, and how much to increase it. Excessive heap growth also leads to longer restart times and increased risk of full GC.

NetworkTopology

NameNode maintains a tree‑like cluster topology via NetworkTopology. The leaf nodes are DatanodeDescriptor objects, whose inheritance diagram is shown in Figure 1. Memory usage of a 64‑bit DatanodeDescriptor is illustrated in Figure 2, and the storageMap (a collection of DatanodeStorageInfo) is detailed in Figure 3.

Assuming a cluster of 2,000 DataNodes, the total memory for these descriptors is roughly (64 + 114 + 56 + 109 × 16) × 2000 ≈ 4 MB.

The internal nodes ( InnerNode) that represent rack information consume about (44 + 48) × 80 + 8 × 2000 ≈ 25 KB for a 2,000‑node cluster. Even for 10,000 nodes, the topology memory stays below 25 MB.

NameSpace

The file system namespace is stored as a tree of INode objects. Figure 5 shows the class hierarchy of INodeFile and INodeDirectory. Memory consumption per directory and file is visualized in Figure 6.

For 100 million directories/files and 100 million blocks, the estimated JVM heap usage is approximately 38 GB, calculated as: Total(Directory) = (24 + 96 + 44 + 48) × 100M + 8 × num(total children) Total(Files) = (24 + 96 + 48) × 100M + 8 × num(total blocks)

BlocksMap

BlocksMap maps each block to the list of DataNodes storing its replicas. It originally used a HashMap, later replaced by a LightWeightGSet for better memory efficiency (see Figure 7). The map allocates about 2 % of the JVM heap for the index space.

For 100 million blocks on a 128 GB heap, BlocksMap consumes roughly 20 GB (16 + 24 + 2 % × 128 GB + (40 + 128) × 100M).

Summary

The two largest memory consumers in the NameNode are Namespace and BlocksMap, together accounting for ~53 GB in a typical production cluster, which matches observed JVM heap usage.

Optimization suggestions include merging small files and adjusting block size to reduce the number of blocks. JVM tuning (e.g., Young/Old generation ratios, CMS initiating occupancy, NewRatio) is also recommended to mitigate full GC risks.

Memory Estimation Model

Total = 198 × num(Directory + Files) + 176 × num(Blocks) + 2 % × JVM‑Memory‑Size

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data Memory Management HDFS NameNode

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.