Cloud Native 10 min read

Mastering Kubernetes Vertical Pod Autoscaling: How VPA Optimizes Resources

This article explains the fundamentals, components, workflow, configuration, best practices, and comparison with HPA for Kubernetes Vertical Pod Autoscaler (VPA), helping readers efficiently tune pod resources and improve cluster utilization.

Ops Development & AI Practice

Mar 7, 2025

Mastering Kubernetes Vertical Pod Autoscaling: How VPA Optimizes Resources

Introduction

In a Kubernetes (K8s) cluster, Vertical Pod Autoscaler (VPA) complements Horizontal Pod Autoscaler (HPA) by automatically adjusting pod resource requests and limits, improving resource utilization while maintaining application performance.

1. What is Vertical Pod Autoscaler (VPA)?

VPA is a Kubernetes component that observes historical resource usage and current cluster capacity to automatically set CPU and memory requests for pods. Unlike HPA, which scales the number of pod replicas, VPA vertically scales individual pods.

2. Core components of VPA

2.1 VPA Recommender

Monitor resource usage : Uses Metrics Server or custom metrics to track historical pod usage.

Generate recommendation values : Based on history and cluster state, it proposes CPU and memory request values for each pod.

Provide recommendation suggestions : Stores the recommendations in the VPA object for the Updater to consume.

2.2 VPA Updater

Check pod status : Determines whether a VPA‑controlled pod needs a resource update.

Evict pod : If the current resources differ from the recommendation and the update mode is "Auto" or "Recreate", the pod is evicted.

Trigger recreation : The evicted pod is recreated by its controller (e.g., Deployment) with the new resource requests.

2.3 VPA Admission Controller

Intercept pod creation requests : Captures pod creation and checks for a matching VPA object.

Apply recommendation values : When a matching VPA is found and the update mode is "Auto" or "Initial", the recommended resources are injected into the pod spec.

Prevent conflicts : Ensures pod resource requests align with VPA recommendations, avoiding manual overrides.

3. VPA workflow

User creates a VPA object, specifying the target workload (e.g., Deployment) and update policy.

VPA Recommender continuously monitors the target pods and generates recommended resource requests.

VPA Updater periodically checks controlled pods; if they differ from recommendations, it acts according to the update mode.

If the update mode is "Auto" or "Recreate", the Updater evicts the pod, triggering recreation with new resources.

If the update mode is "Initial", the Admission Controller applies recommendations at pod creation.

If the update mode is "Off", VPA only provides recommendations without taking action.

4. VPA configuration

Example VPA YAML configuration:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi
      controlledResources: ["cpu", "memory"]

4.1 Configuration interpretation

targetRef : Specifies the workload (Deployment, StatefulSet, DaemonSet) that VPA will control.

updatePolicy :

updateMode options:

"Off": VPA only provides recommendations.

"Initial": Apply recommendations only at pod creation.

"Recreate": Update resources by evicting and recreating pods when needed.

"Auto": Same as "Recreate" but VPA decides automatically whether eviction is required.

resourcePolicy :

containerPolicies defines per‑container limits:

containerName : "*" means all containers.

minAllowed : Minimum CPU/memory requests.

maxAllowed : Maximum CPU/memory requests.

controlledResources : Resources VPA manages, e.g., ["cpu", "memory"].

controlledValues (optional): "RequestsOnly" or "RequestsAndLimits".

5. Best practices for VPA

Start with small scale : Test VPA on non‑critical workloads first to observe impact.

Set reasonable resource limits : Use resourcePolicy with minAllowed and maxAllowed to avoid over‑ or under‑provisioning.

Monitor VPA status : Run kubectl describe vpa <vpa-name> to view status, events, and recommendations.

Choose appropriate update mode : For stateless apps, "Auto" or "Recreate" works well; for stateful apps, consider "Initial" or manual updates.

Combine with HPA : Use VPA for vertical scaling and HPA for horizontal scaling, but avoid configuring both to adjust the same resource simultaneously.

Be aware of limitations : VPA does not support batch workloads (Job) and may cause brief service interruptions during pod updates.

6. Comparison of VPA and HPA

Scaling method : VPA adjusts resources of individual pods vertically; HPA changes the number of pod replicas horizontally.

Suitable scenarios : VPA fits workloads with large resource demand fluctuations and high per‑pod performance needs; HPA suits stateless services that can scale out.

Resource utilization : VPA improves cluster efficiency by fine‑tuning pod resources; HPA may lead to resource fragmentation.

Service interruption : VPA can cause short outages depending on updateMode; HPA typically does not interrupt services, though scaling takes time.

Co‑operation : Both can be used together, but avoid simultaneous adjustments of the same CPU or memory metric to prevent conflicts.

7. Summary

VPA is a key Kubernetes component for vertical pod autoscaling. By automatically adjusting pod resource requests, it enhances cluster utilization while preserving application performance. Proper configuration and combined use with HPA enable smarter, more efficient Kubernetes clusters that reliably support business workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes autoscaling Resource Optimization vertical pod autoscaler VPA

Written by

Ops Development & AI Practice

DevSecOps engineer sharing experiences and insights on AI, Web3, and Claude code development. Aims to help solve technical challenges, improve development efficiency, and grow through community interaction. Feel free to comment and discuss.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.