Cloud Computing 6 min read

How We Cut Kubernetes Costs by 40% Without Switching Platforms

By rethinking resource requests, eliminating unused workloads, downsizing node types, fine‑tuning autoscaling, and trimming log storage, a team reduced their Kubernetes bill by 40% while keeping the same cloud provider, demonstrating that most cost overruns stem from misconfiguration rather than the platform itself.

dbaplus Community

Dec 22, 2025

How We Cut Kubernetes Costs by 40% Without Switching Platforms

Many assume rising Kubernetes bills are an unavoidable cost of running workloads in the cloud, but the real issue is often how the cluster is configured and used.

Step 1: Right‑size resource requests

Using Prometheus metrics and the Vertical Pod Autoscaler (VPA) in recommendation mode, the team discovered that about 70% of Pods requested 2‑3× more CPU and memory than they actually used. By lowering those limits to realistic values, the cluster’s autoscaler removed several nodes overnight, saving roughly 15% of the cost.

Step 2: Remove ghost workloads

Unused or forgotten workloads—such as test clusters, old batch jobs, and PR preview applications—were identified with the following command:

kubectl get pods --all-namespaces --sort-by=.metadata.creationTimestamp

After deleting these resources and adding a TTL controller for preview environments, the bill dropped an additional 10% .

Step 3: Use smaller nodes instead of large ones

Large instances (e.g., >32 vCPU) often leave half a node idle, wasting money. Switching to smaller instance types like m6i.large and letting the autoscaler adjust capacity increased utilization and reduced costs by about 8% .

Step 4: Make autoscaling actually work

Enabling the Horizontal Pod Autoscaler (HPA) alone is insufficient; the team set a high CPU threshold that never triggered. They adjusted the target to the 90th‑percentile usage and added custom metrics (e.g., request rate via Prometheus Adapter). During low‑traffic periods pods and nodes scaled down, saving another 5‑7% .

Step 5: Trim logs, storage, and data

Audit, debug, and system logs were stored on expensive block storage. By moving these logs to S3 Glacier, shortening the retention of non‑critical logs to seven days, and stopping debug‑log generation in production, the team saved roughly 6% of the overall spend.

After weeks of cleanup, the Kubernetes bill fell by about 40% without changing cloud providers. The cluster became smaller, faster, easier to maintain, and the team now reviews resource usage weekly to prevent waste.

cloud computing Kubernetes resource management autoscaling cost optimization Prometheus

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.