Why Did My Hadoop Node’s Memory Spike at 3 AM? A Step‑by‑Step Debug Guide
This article details a systematic investigation of a Hadoop NameNode/DataNode that showed high memory usage at 3 AM, identifies zombie crond/sendmail/postdrop processes caused by a failed Postfix service, and provides cleanup commands and preventive measures for memory, disk, and inode issues.
2. Investigation Process
Root access was used to fully diagnose the issue.
2.1 Check component startup status
The listed processes belong to Hadoop, Yarn, and Zookeeper components; no abnormal Java processes were observed.
2.2 Examine resource consumption of Hadoop processes
Resource usage for processes under the hadoop user appeared normal, with no high‑consumption entries.
2.3 Investigate abnormal processes
Using top revealed many crond processes, each consuming minimal resources.
Command to view specific processes: ps -ef|grep pid The sendmail process is started by crond, and postdrop is started by sendmail. When Postfix is not running, these processes cannot exit, creating numerous zombie processes.
3. Resolution
Terminate the related sendmail and postdrop processes:
ps -ef | egrep "sendmail|postdrop" | grep -v grep | awk '{print $2}' | xargs killTo prevent recurrence, disable crond email notifications by setting MAILTO="" in /etc/crontab and /etc/cron.d/0hourly or adding MAILTO="" as the first line in crontab -e.
4. Outcome
Memory usage returned to normal levels.
5. Derived Issues
5.1 System disk full
Running du -h --max-depth=1 on the root directory timed out due to a massive /var/log/maillog file (32 GB). The log was cleared to free space.
5.2 Inode exhaustion on system disk
Although disk usage decreased, inode usage remained high because /var/spool/postfix/maildrop/ contained over 640 k small files generated by failed sendmail services.
Cleanup command: ls -f | xargs -n 1 rm -rf After removal, inode metrics returned to normal.
6. Optimization Recommendations
(1) Disable crond email notifications by setting MAILTO="" in relevant cron configuration files.
(2) Add disk usage monitoring and alerts.
(3) For systems that generate many files (e.g., databases, caches, file storage), implement inode usage monitoring and alerts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
