Big Data 6 min read

Understanding HDFS SecondaryNameNode and the Checkpoint Process

This article explains the role of HDFS SecondaryNameNode, the structure of fsimage and edits files, how checkpointing works—including configuration parameters and steps—and how the process changes when NameNode high availability is enabled.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding HDFS SecondaryNameNode and the Checkpoint Process

Preface

What does HDFS SecondaryNameNode do?

This is a classic basic interview question that many candidates answer incorrectly, often thinking it is merely a hot standby for NameNode. This article briefly explains the concept, referring to SecondaryNameNode as SNN and NameNode as NN.

NN and fsimage, edits files

NN manages all metadata in HDFS, such as file and directory structures, permissions, block IDs, sizes, replica policies, etc. Clients obtain metadata from NN before performing read/write operations. While NN is running, metadata resides in memory for fast response.

Because in‑memory metadata is unreliable, it must be persisted to disk. NN uses two types of files for persistence:

fsimage file , prefixed with fsimage_ , which stores a serialized snapshot of the entire metadata.

edits file (also called edit log), prefixed with edits_ , which records incremental metadata changes (client write operations) in order.

Both files are stored under ${dfs.namenode.name.dir}/current/ as shown below:

The currently written edits file contains the "inprogress" marker, and the seen_txid file records the ID of that in‑progress edits file.

At any moment, the latest fsimage combined with the subsequent edits files represent the full metadata. When NN starts, it loads the latest fsimage into memory and replays the following edits to reconstruct the metadata state.

SNN and the checkpoint process

To prevent edits files from growing too large and to shorten NN startup recovery time, periodic merging of edits into a new fsimage—called a checkpoint—is required.

Because NN is already heavily loaded, Hadoop delegates this I/O‑intensive merging to SNN. In other words, SNN assists NN in performing checkpoint operations.

Checkpoint triggering is controlled by two parameters in hdfs-site.xml: dfs.namenode.checkpoint.period: the time interval between checkpoints (default 1 hour). dfs.namenode.checkpoint.txns: the maximum number of transactions between checkpoints (default 1,000,000).

If either condition is met, the checkpoint process proceeds as follows:

NN creates a new edits_inprogress file; subsequent modifications are written to this file while the previous edits file becomes pending for merging.

The pending edits file and the current fsimage are copied to the SNN.

SNN loads the fsimage into memory, replays the edits, and produces a merged fsimage.chkpoint file.

SNN copies the fsimage.chkpoint back to NN and renames it to the official fsimage file.

Hadoop’s official diagram (shown below) illustrates the same concept despite different file names:

If NameNode High Availability is enabled

The above description assumes a single NN cluster. In a HA setup with two NNs, SNN is unnecessary—checkpointing is handled directly by the Standby NN. The Active NN writes edits to both local storage and shared storage (e.g., JournalNode quorum). The Standby NN pulls edits from the JournalNode cluster, merges them, and keeps its fsimage synchronized with the Active NN.

Source: https://www.jianshu.com/p/5b4dd843b29d

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataHDFSHadoopCheckpointFilesystem
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.