Why a Default Kubernetes Setting Can Spike CPU Usage and How to Fix It
A Node.js service migrated to containers began experiencing intermittent timeouts and high CPU usage due to the default enableServiceLinks parameter injecting thousands of environment variables, and the analysis shows how to identify, reproduce, and resolve the issue with Kubernetes configuration and code adjustments.
Problem Timeline
[xx:xx] Business reported API response timeouts.
[xx:xx] Development, SRE and middleware teams investigated code, gateway and network without success.
[xx:xx] Issue reproduced in a test environment.
[xx:xx] Quick fix deployed using differential analysis and experience.
[xx:xx] Root cause identified.
Observed Symptoms
Intermittent API timeouts.
Consistently high CPU usage after containerisation.
Normal CPU usage and response times on the VM before migration.
Investigation Steps
Network and application code were ruled out using elimination tests.
Differential analysis showed the problem only on containers, not on a VM or a Serverless test cluster.
Service environment variables are automatically injected into Pods.
Serverless clusters run fewer instances, so fewer Service variables are injected.
Verification:
Inside a regular cluster Pod, count environment variables: env | wc -l Result: ~16,000 variables, mostly auto‑injected Service vars.
Disable automatic injection by setting enableServiceLinks: false in the Pod spec. CPU usage and API latency returned to normal.
Root Cause Analysis
Prioritise fixing the issue, then investigate the root cause.
General Performance‑Analysis Steps
System resource bottlenecks are measured using the USE method (Utilisation, Saturation, Errors). Resources include hardware (CPU, memory, disk, network) and software (file descriptors, connection tracking, socket buffers).
Application bottlenecks manifest as reduced throughput, higher error rates and increased latency, caused by resource limits, dependent services or inefficient code.
Specific Investigation Process
Real‑time CPU monitoring via a container dashboard, followed by command‑line latency testing:
for i in `seq 1 100`; do time curl -I ${API}; doneUse strace and perf to trace system calls and hot functions:
# Identify the node where the Pod runs
kubectl -n work describe ${PodName} | grep 'Node:' | awk -F/ '{print $2}'
# Get the container ID
kubectl -n work get pod ${PodName} -o template --template='{{range .status.containerStatuses}}{{.containerID}}{{end}}' | sed 's/docker:\/\(.*\)$/\1/'
# Find the PID on the host
docker inspect -f {{.State.Pid}} ${ContainerID}
# Trace system calls of the main process
strace -f -T -tt -p ${PID} -o trace.log
# Sample API latency
date +"%H:%M:%S"; time curl -I ${API}; date +"%H:%M:%S"Analysis of trace.log revealed child processes executing free and df, causing futex resumed events that blocked the main Node.js event loop.
Further profiling with perf and FlameGraph did not pinpoint a hot function, so node --prof together with flamebearer was used:
# Enter the container
kubectl -n work exec -it ${PodName} -- /bin/sh
# Run Node.js with profiling flags
node /data/node_modules/.bin/cross-env NODE_ENV=work node --prof --jitless --no-lazy src/main
# Install flamebearer (registry mirror shown for completeness)
npm install -g flamebearer --registry=https://registry.npmmirror.com
# Process the profile and generate a flamegraph
node --prof-process --preprocess -j isolate*.log | flamebearerThe flamegraph identified child_process.js (specifically execSync → spawnSync → normalizeSpawnArguments) as the hotspot. Disabling the offending code restored normal performance.
Source‑code inspection of Node.js (v14) showed the environment‑variable handling loop:
const env = options.env || process.env;
const envPairs = [];
for (const key in env) {
const value = env[key];
if (value !== undefined) {
envPairs.push(`${key}=${value}`);
}
}Running this loop inside the Pod took ~7.5 seconds, confirming the overhead.
Two main factors contributed to the slowdown:
Processing a process.env object with ~16 k entries (the loop itself is slow).
JavaScript for‑in loops have poor performance on large maps.
Remediation Options
Set enableServiceLinks: false in the Pod spec to stop automatic injection of Service environment variables.
Replace synchronous child_process.execSync calls with asynchronous child_process.exec to avoid blocking the event loop.
When using execSync, explicitly provide a minimal env object instead of inheriting the full system environment (see Node.js source line 586).
Conclusion
The default enableServiceLinks setting in Kubernetes injects thousands of Service variables into each Pod, dramatically increasing the cost of iterating over process.env in Node.js applications. This leads to high CPU usage and intermittent timeouts. Disabling this feature or limiting environment‑variable usage resolves the issue.
If Service environment variables are not required, set enableServiceLinks: false in the Pod spec to disable the injection.
Relevant issues and references:
https://github.com/nodejs/node/issues/3104
https://github.com/kubernetes/kubernetes/issues/60099
https://github.com/kubernetes/kubernetes/issues/121787
Kubernetes documentation: https://kubernetes.io/docs/tutorials/services/connect-applications-service/
Flamebearer project: https://github.com/mapbox/flamebearer
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
