Cloud Native 33 min read

How I Cut My Kubernetes Cloud Bill by 60% in 3 Months – Proven Strategies

Facing a 35‑million‑yuan monthly Kubernetes bill, the author analyzed hidden cost components, implemented five optimization campaigns—including resource request tuning, autoscaling, spot instances, storage tiering, and network consolidation—and reduced monthly expenses by 60% while boosting performance, delivering a detailed, reproducible methodology.

Ops Community

Oct 8, 2025

How I Cut My Kubernetes Cloud Bill by 60% in 3 Months – Proven Strategies

Introduction

"Why did the cloud bill increase by 20% this month?" The CFO’s question spurred a deep dive into a Kubernetes cluster on Alibaba Cloud that was costing 350,000 CNY per month. Over three months of systematic optimization, the monthly cost dropped to 140,000 CNY—a 60% reduction—while performance improved by 30%.

Technical Background: The Hidden Truth of Kubernetes Costs

Cost Composition

Many assume cloud cost equals server cost, but it’s far more complex. For Alibaba Cloud the breakdown is roughly:

Compute (60‑70%): ECS instance fees (pay‑as‑you‑go vs. subscription), reserved instance coupons, spot instances.

Storage (15‑20%): Cloud disks (SSD, ESSD), NAS, OSS object storage.

Network (10‑15%): Public bandwidth (fixed vs. usage‑based), intra‑zone traffic, SLB load balancer fees.

Other (5‑10%): Snapshots, monitoring, image registry.

Common Causes of Resource Waste

According to CNCF surveys, average Kubernetes utilization is only 25‑35%, meaning 60‑75% of resources are idle.

Over‑provisioned resources: Developers request excess CPU/memory, many pods lack limits, request‑limit gaps are large.

Lack of autoscaling: Fixed pod counts, HPA not enabled, Cluster Autoscaler missing.

Fragmented resources: Uneven node utilization, missing node/pod affinity, many small nodes.

Poor storage usage: PersistentVolumeClaims never deleted, oversized PVCs, high‑performance disks used for low‑value data.

Initial State

Cluster details before optimization:

Kubernetes version: 1.24

45 ECS nodes (8 vCPU, 16 GiB)

~300 Pods, 35 micro‑services

Monthly cost composition (350,000 CNY):

ECS instances: 240,000 CNY (68.5%)

Cloud disks: 45,000 CNY (12.8%)

Network bandwidth: 30,000 CNY (8.6%)

SLB load balancer: 20,000 CNY (5.7%)

Other (monitoring, logs, etc.): 15,000 CNY (4.4%)

CPU avg. utilization 28%, memory 35%, storage 42% – clear signs of waste.

Core Content: Five Battles of Kubernetes Cost Optimization

Battle 1 – Optimize Resource Requests & Limits

Diagnosis

Collect pod usage with kubectl top pods and inspect requests/limits.

# 1. View all pod resource configs and usage
kubectl top pods --all-namespaces

# 2. Extract requests & limits
kubectl get pods --all-namespaces -o json | jq ...

# 3. List low‑usage pods
kubectl top pods --all-namespaces --sort-by='cpu' | tail -n 50

Findings

67% of Pods have no Request/Limit.

Many Pods request 2 CPU/4 GiB but actually use 200 mCPU/512 MiB.

Requests far exceed limits (e.g., request 1 CPU, limit 4 CPU).

Solution

1️⃣ Set realistic requests based on real usage and define limits. 2️⃣ Create a resource‑configuration matrix per service type.

# Before (no resources)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: user-service
        image: user-service:v1.0
# After (optimized)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: user-service
        image: user-service:v1.0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
          startupProbe: {...}
          readinessProbe: {...}
          livenessProbe: {...}

3️⃣ Use VPA (Vertical Pod Autoscaler) for ongoing recommendations.

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Create VPA object
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: user-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  updatePolicy:
    updateMode: "Off"  # recommend only

Result: CPU avg. utilization rose from 28% to 52%, memory from 35% to 58%; 12 nodes could be shut down, saving ~64,000 CNY per month.

Battle 2 – Implement Elastic Scaling

HPA (Horizontal Pod Autoscaler)

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify
kubectl top nodes
kubectl top pods

# HPA manifest (example)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

3️⃣ Time‑based scaling with CronHPA for predictable traffic patterns.

# CronHPA example
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  name: user-service-cron-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  jobs:
  - name: scale-up-workday
    schedule: "0 8 * * 1-5"
    targetSize: 10
  - name: scale-down-workday
    schedule: "0 22 * * 1-5"
    targetSize: 3
  - name: scale-weekend
    schedule: "0 0 * * 6,0"
    targetSize: 2

Cluster Autoscaler (Node‑level scaling)

# Node pool configuration (Alibaba Cloud ACK)
NodePoolName: default-pool
InstanceType: ecs.c6.2xlarge
MinSize: 5
MaxSize: 30
AutoScaling: enabled

# Deploy Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - name: cluster-autoscaler
        image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.6.0
        command:
        - ./cluster-autoscaler
        - --cloud-provider=alicloud
        - --nodes=5:30:default-pool
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5

Result: Node count fell from a fixed 45 to an average of 18, saving ~144,000 CNY per month.

Battle 3 – Use Spot (Preemptible) Instances

Spot instances cost 10‑20% of pay‑as‑you‑go rates but can be reclaimed.

Annual subscription: ~4,500 CNY/month

Pay‑as‑you‑go: ~584 CNY/month

Spot: ~73 CNY/month (1.6% of subscription price)

Suitable for stateless services, batch jobs, dev/test environments, and highly‑available workloads with multiple replicas.

# Spot node pool definition (Alibaba Cloud)
NodePoolName: spot-pool
InstanceType: ecs.c6.2xlarge
ChargeType: Spot
MinSize: 0
MaxSize: 20
Labels:
  node-type: spot
taints:
  spot: "true":NoSchedule

Pods that can tolerate the spot taint are scheduled there, and a termination‑handler drains nodes before reclamation.

# DaemonSet handling spot termination
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spot-termination-handler
  namespace: kube-system
spec:
  template:
    spec:
      nodeSelector:
        node-type: spot
      containers:
      - name: handler
        image: termination-handler:v1.0
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        # Script watches cloud interruption API and drains the node

Result: 40% of workload moved to spot, cutting node cost by ~30% and saving ~72,000 CNY per month.

Battle 4 – Storage Cost Optimization

Diagnosis

# List all PVCs and their usage
kubectl get pvc --all-namespaces

# Find unused PVCs
kubectl get pvc --all-namespaces -o json | jq -r '.items[] | select(.status.phase == "Bound") | .metadata.namespace + "/" + .metadata.name'

Issues found:

50+ stale PVCs from test environments.

Oversized PVCs (e.g., 100 Gi requested, only 5 Gi used).

High‑performance ESSD used for logs.

Solutions

Delete unused PVCs.

Adopt tiered StorageClasses (ESSD‑PL3 for databases, SSD for general apps, efficiency disks for logs, NAS for shared config).

Use emptyDir for temporary files and logs, with side‑car log‑cleaner.

Archive cold data to OSS object storage via FluentBit.

# Example tiered StorageClass definitions
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alicloud-disk-essd-pl3
provisioner: diskplugin.csi.alibabacloud.com
parameters:
  type: cloud_essd
  performanceLevel: PL3
reclaimPolicy: Retain
allowVolumeExpansion: true
---
metadata:
  name: alicloud-disk-ssd
parameters:
  type: cloud_ssd
reclaimPolicy: Delete
---
metadata:
  name: alicloud-disk-efficiency
parameters:
  type: cloud_efficiency
reclaimPolicy: Delete

Result: Disk usage dropped from 8 TB to 3.2 TB, storage cost fell from 45,000 CNY to 18,000 CNY per month (saving 27,000 CNY).

Battle 5 – Network & Load‑Balancer Optimization

Diagnosis

# List all LoadBalancer services
kubectl get svc --all-namespaces -o wide | grep LoadBalancer
# Found 15 SLB instances, each ~60 CNY/day → ~27,000 CNY/month

Solution

Replace many LoadBalancer services with a single Ingress backed by one SLB.

Switch to pay‑by‑traffic bandwidth billing and use shared bandwidth packages.

Enable topology‑aware hints to keep intra‑zone traffic local.

# Ingress example sharing one SLB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /user
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 8080
      - path: /order
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 8080

Result: SLB count reduced from 15 to 2, network cost cut from 30,000 CNY to 8,000 CNY per month (saving 22,000 CNY).

Cost‑Optimization Summary

Item                Before (CNY/month)   After (CNY/month)   Savings (CNY)   Savings %
ECS instances       240,000              96,000              144,000        60%
Cloud disks         45,000               18,000               27,000        60%
Network bandwidth   30,000                8,000               22,000        73%
SLB load balancer   20,000                5,000               15,000        75%
Other               15,000               13,000                2,000        13%
---------------------------------------------------------------
Total              350,000              140,000              210,000        60%

Annual savings: 2.52 million CNY.

Best Practices & Pitfalls

Kubernetes Cost‑Optimization Golden Rules

Visibility first: You cannot optimize what you cannot measure.

Iterate fast: Tackle one issue at a time, validate, then proceed.

Safety first: Cost cuts must not compromise stability.

Automate: Manual tweaks are unsustainable; use VPA, HPA, CA, and CI pipelines.

Toolchain

Kubecost – cost analysis dashboard.

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer --namespace kubecost --create-namespace
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

Goldilocks – VPA recommendation UI.

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace
kubectl label namespace default goldilocks.fairwinds.com/enabled=true

Prometheus alerts for cost anomalies.

# Example alert for high memory usage
- alert: CostIncreaseAnomaly
  expr: sum(container_memory_working_set_bytes{container!=""}) by (namespace) / 1024 / 1024 / 1024 > 100
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Namespace {{ $labels.namespace }} memory usage exceeds 100GB"

Common Traps

Over‑optimizing: Setting requests too low harms stability – keep 20‑30% headroom for critical services.

Blind spot usage: Never run stateful services on spot instances without replication and graceful termination handling.

Ignoring hidden costs: Snapshots, logs, monitoring, and image storage can erode savings – clean them regularly.

Lack of monitoring: Without dashboards and alerts, costs creep back up.

Optimization Focus by Cluster Size

Small (<10 nodes): Focus on request/limit tuning, delete unused resources, consolidate LoadBalancers.

Medium (10‑100 nodes): Deploy HPA & Cluster Autoscaler, mix spot instances, tier storage, establish cost monitoring.

Large (>100 nodes): Purchase reserved instance coupons, manage multiple clusters, build internal FinOps platform, foster cost‑ownership culture.

Conclusion & Outlook

Systematic Kubernetes cost optimization cut monthly spend from 350,000 CNY to 140,000 CNY (60% reduction) while boosting performance by 30%. The five‑battle framework proves that cost savings and performance gains are not contradictory.

Key Takeaways

Resource configuration is fundamental: Proper requests/limits can save >50% of compute cost.

Elastic scaling is critical: HPA + Cluster Autoscaler align resources with demand.

Spot instances are a powerful lever: When used appropriately, they slash compute cost by up to 80%.

Storage optimization is often overlooked: Tiered storage and cleanup yield large savings.

Continuous monitoring sustains gains: Without observability, optimizations fade.

FinOps Culture

Cost optimization is an ongoing practice, not a one‑off project. Promote cost awareness, transparency (chargeback), incentive mechanisms, and regular reviews to embed financial responsibility into engineering teams.

Future Trends

Maturing FinOps tools (Kubecost, CloudHealth).

AI‑driven automatic resource right‑sizing and cost prediction.

More stable spot instance offerings.

Serverless containers that charge strictly by actual usage.

Mastering Kubernetes’s resource model and applying a systematic optimization methodology remains essential for any operations or cloud‑native engineer.

cloud-native Kubernetes autoscaling cost optimization FinOps Spot Instances

Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Technical Background: The Hidden Truth of Kubernetes Costs

Cost Composition

Common Causes of Resource Waste

Initial State

Core Content: Five Battles of Kubernetes Cost Optimization

Battle 1 – Optimize Resource Requests & Limits

Diagnosis

Findings

Solution

Battle 2 – Implement Elastic Scaling

HPA (Horizontal Pod Autoscaler)

Cluster Autoscaler (Node‑level scaling)

Battle 3 – Use Spot (Preemptible) Instances

Battle 4 – Storage Cost Optimization

Diagnosis

Solutions

Battle 5 – Network & Load‑Balancer Optimization

Diagnosis

Solution

Cost‑Optimization Summary

Best Practices & Pitfalls

Kubernetes Cost‑Optimization Golden Rules

Toolchain

Common Traps

Optimization Focus by Cluster Size

Conclusion & Outlook

Key Takeaways

FinOps Culture

Future Trends

Ops Community

How this landed with the community

Was this worth your time?

0 Comments

Battle 1 – Optimize Resource Requests & Limits

Battle 2 – Implement Elastic Scaling

Battle 3 – Use Spot (Preemptible) Instances

Battle 4 – Storage Cost Optimization

Battle 5 – Network & Load‑Balancer Optimization