Operations 5 min read

Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide

This article details a systematic investigation of a Hadoop NameNode/DataNode that showed high memory usage at 3 AM, identifies zombie crond/sendmail/postdrop processes caused by a failed Postfix service, and provides cleanup commands and preventive measures for memory, disk, and inode issues.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide

2. Investigation Process

Root access was used to fully diagnose the issue.

2.1 Check component startup status

The listed processes belong to Hadoop, Yarn, and Zookeeper components; no abnormal Java processes were observed.

2.2 Examine resource consumption of Hadoop processes

Resource usage for processes under the hadoop user appeared normal, with no high‑consumption entries.

2.3 Investigate abnormal processes

Using

top

revealed many

crond

processes, each consuming minimal resources.

Command to view specific processes:

<code>ps -ef|grep pid</code>

The

sendmail

process is started by

crond

, and

postdrop

is started by

sendmail

. When Postfix is not running, these processes cannot exit, creating numerous zombie processes.

3. Resolution

Terminate the related

sendmail

and

postdrop

processes:

<code>ps -ef | egrep "sendmail|postdrop" | grep -v grep | awk '{print $2}' | xargs kill</code>

To prevent recurrence, disable crond email notifications by setting

MAILTO=""

in

/etc/crontab

and

/etc/cron.d/0hourly

or adding

MAILTO=""

as the first line in

crontab -e

.

4. Outcome

Memory usage returned to normal levels.

5. Derived Issues

5.1 System disk full

Running

du -h --max-depth=1

on the root directory timed out due to a massive

/var/log/maillog

file (32 GB). The log was cleared to free space.

5.2 Inode exhaustion on system disk

Although disk usage decreased, inode usage remained high because

/var/spool/postfix/maildrop/

contained over 640 k small files generated by failed sendmail services.

Cleanup command:

<code>ls -f | xargs -n 1 rm -rf</code>

After removal, inode metrics returned to normal.

6. Optimization Recommendations

(1) Disable crond email notifications by setting

MAILTO=""

in relevant cron configuration files.

(2) Add disk usage monitoring and alerts.

(3) For systems that generate many files (e.g., databases, caches, file storage), implement inode usage monitoring and alerts.

memory leakSystem AdministrationInodeHadoopDisk UsageZombie Process
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.