Operations 6 min read

Diagnosing High CPU Load Caused by Frequent Short‑Lived Processes in a MongoDB Environment Using execsnoop

The article describes how a MongoDB test environment on a single VM experienced persistent high CPU load despite low visible QPS, how the root cause was traced to thousands of short‑lived processes spawned by Zabbix monitoring, and how execsnoop was used to identify and eliminate the offending processes.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Diagnosing High CPU Load Caused by Frequent Short‑Lived Processes in a MongoDB Environment Using execsnoop

Background

A test environment on a single virtual machine runs one MongoDB cluster consisting of a mongos, three config nodes, and a single shard with three replica set members (total 7 MongoDB instances) on CentOS 7.9, MongoDB 4.2.19.

After the test, CPU load stayed around 50% while MongoDB QPS dropped to zero. Stopping all MongoDB instances immediately restored normal CPU usage; restarting them caused the load to rise again, confirming the issue is tied to MongoDB.

Diagnosis

Running top showed user CPU at about 40%, but the sum of %CPU of the top processes was far lower than the total load.

Checking mongos QPS confirmed no user commands were being executed.

Using dstat and perf record -ag -- sleep 10 && perf report showed normal metrics for most indicators; the problem appeared to be many short‑lived processes that top could not capture.

Running sar -w 1 revealed that more than 80 new processes were being created each second.

To capture these fleeting processes, the article recommends using execsnoop , a tool based on ftrace that logs each exec() call with PID/PPID and command line arguments.

#下载execsnoop#
cd /usr/bin
wget https://raw.githubusercontent.com/brendangregg/perf‐tools/master/execsnoop
chmod 755 execsnoop

Running execsnoop for 10 seconds produced over 400 records, each representing a short‑lived process that connected to MongoDB and then performed a grep on the output.

Stopping the zabbix process instantly normalized CPU usage, identifying it as the culprit. The VM runs seven MongoDB instances, and Zabbix monitors each one, spawning monitoring tasks seven times on a 4‑core VM, which amplified the overhead.

The issue was resolved by disabling Zabbix monitoring for this node and planning to optimise monitoring logic to reduce database connection frequency and grep call chains.

Conclusion

When CPU load remains high but top cannot reveal the responsible processes, tools like execsnoop (or iosnoop, opensnoop) can capture short‑lived processes and help pinpoint the root cause.

MongoDBperformance troubleshootinglinux monitoringZabbixCPU Loadexecsnoop
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.