Master Kubernetes Capacity Planning: Detect & Optimize Unused Resources
This guide explains Kubernetes capacity planning, showing how to detect idle CPU and memory, identify wasteful namespaces, use open‑source tools like kube‑state‑metrics and cAdvisor, and apply PromQL queries to optimize resource requests and measure the impact of your improvements.
Kubernetes capacity planning is a major challenge for infrastructure engineers because understanding resource requirements and limits is not easy.
You may over‑provision resources to ensure containers don’t run out of memory or hit CPU limits, which can lead to unnecessary cloud costs and harder scheduling. Balancing cluster stability, reliability, and efficient resource use is why capacity planning matters.
This article shows how to identify unused resources and allocate cluster capacity wisely.
Don’t Be a Greedy Developer
Sometimes containers request more resources than they need. A single container may have little impact, but when many containers over‑request, extra costs appear in large clusters.
Oversized Pods also make scheduling harder.
Two open‑source tools can help with Kubernetes capacity planning:
kube‑state‑metrics – an add‑on exporter that generates and exposes cluster‑level metrics.
cAdvisor – a resource usage analyzer for containers.
Running these tools in your cluster lets you avoid under‑utilization and adjust resource allocations.
How to Detect Under‑Utilized Resources
CPU
CPU usage is one of the hardest thresholds to tune; too low limits service compute power, too high leaves nodes idle.
Detect Idle CPU
Using the metrics
container_cpu_usage_seconds_totaland
kube_pod_container_resource_requestsyou can see core utilization.
sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 > 0)Identify Namespaces Wasting CPU
Aggregating past queries by namespace gives finer‑grained insight, allowing you to hold teams accountable for over‑provisioned workloads.
sum by (namespace)((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 > 0)Top 10 CPU‑Hungry Containers
Use the
topkfunction to list the containers with the highest CPU waste.
topk(10,sum by (namespace,pod,container)((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 > 0))Memory
Proper memory planning is crucial; high usage can trigger OOM eviction, while over‑provisioning reduces the number of Pods per node.
Detect Unused Memory
Metrics
container_memory_usage_bytesand
kube_pod_container_resource_requestsreveal wasted memory.
sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 > 0) / (1024*1024*1024)The example saves about 0.8 GB for the cluster.
Identify Namespaces Wasting Memory
Aggregate by namespace similarly to CPU.
sum by (namespace)((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 > 0) / (1024*1024*1024)Top 10 Memory‑Heavy Containers
Again,
topkhighlights the containers that waste the most memory in each namespace.
topk(10,sum by (namespace,pod,container)((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 > 0) / (1024*1024*1024))Optimizing Container Resource Utilization
To keep enough compute capacity, analyze current usage. The following PromQL query calculates the average CPU utilization of all containers belonging to the same workload (Deployment, StatefulSet, or DaemonSet).
avg by (namespace,owner_name,container)((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[5m])) * on(namespace,pod) group_left(owner_name) avg by (namespace,pod,owner_name)(kube_pod_owner{owner_kind=~"DaemonSet|StatefulSet|Deployment"}))Based on experience, set container requests to 85 %–115 % of the average CPU or memory usage.
Measuring the Impact of Optimization
After capacity‑planning actions, compare unused CPU cores now versus a week ago to assess the effect.
sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 > 0) - sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m] offset 1w) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"} offset 1w)) * -1 > 0)The chart shows fewer unused CPU cores after optimization.
Conclusion
You now understand the consequences of over‑provisioning, how to detect excessive resource allocation, set appropriate container requests, and measure the impact of your optimizations.
These techniques provide a solid foundation for building a comprehensive Kubernetes capacity‑planning dashboard.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.