Operations 5 min read

Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide

This article details a systematic investigation of a Hadoop NameNode/DataNode that showed high memory usage at 3 AM, identifies zombie crond/sendmail/postdrop processes caused by a failed Postfix service, and provides cleanup commands and preventive measures for memory, disk, and inode issues.

Data Thinking Notes

Nov 9, 2022

Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide

2. Investigation Process

Root access was used to fully diagnose the issue.

2.1 Check component startup status

The listed processes belong to Hadoop, Yarn, and Zookeeper components; no abnormal Java processes were observed.

2.2 Examine resource consumption of Hadoop processes

Resource usage for processes under the hadoop user appeared normal, with no high‑consumption entries.

2.3 Investigate abnormal processes

Using top revealed many crond processes, each consuming minimal resources.

Command to view specific processes: ps -ef|grep pid The sendmail process is started by crond, and postdrop is started by sendmail. When Postfix is not running, these processes cannot exit, creating numerous zombie processes.

3. Resolution

Terminate the related sendmail and postdrop processes:

ps -ef | egrep "sendmail|postdrop" | grep -v grep | awk '{print $2}' | xargs kill

To prevent recurrence, disable crond email notifications by setting MAILTO="" in /etc/crontab and /etc/cron.d/0hourly or adding MAILTO="" as the first line in crontab -e.

4. Outcome

Memory usage returned to normal levels.

5. Derived Issues

5.1 System disk full

Running du -h --max-depth=1 on the root directory timed out due to a massive /var/log/maillog file (32 GB). The log was cleared to free space.

5.2 Inode exhaustion on system disk

Although disk usage decreased, inode usage remained high because /var/spool/postfix/maildrop/ contained over 640 k small files generated by failed sendmail services.

Cleanup command: ls -f | xargs -n 1 rm -rf After removal, inode metrics returned to normal.

6. Optimization Recommendations

(1) Disable crond email notifications by setting MAILTO="" in relevant cron configuration files.

(2) Add disk usage monitoring and alerts.

(3) For systems that generate many files (e.g., databases, caches, file storage), implement inode usage monitoring and alerts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

memory-leak system-administration inode Hadoop disk usage zombie process

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.