Cloud Computing 9 min read

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

This article explains how Kubernetes' built‑in autoscaling mechanisms—Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler—work, when to use each, and best‑practice tips to reduce cloud costs while maintaining application performance.

MaGe Linux Operations

Jul 11, 2021

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

From a technical perspective, containerized applications can provide cost advantages, but Kubernetes is full of cost traps that can exceed budgets. Autoscaling is one strategy to control cloud costs, and Kubernetes offers three built‑in autoscaling mechanisms that, when combined effectively, lower the cost of running applications.

1. Pod Horizontal Autoscaling (HPA)

Horizontal Pod Autoscaler (HPA) can automatically scale the number of pods in ReplicationController, Deployment, ReplicaSet, and StatefulSet based on CPU utilization or custom metrics.

In production, workloads often fluctuate, and adding or removing pod replicas in real time yields better cost efficiency. HPA monitors pod metrics, calculates the average value, and adjusts the replica count to approach the target.

When to use HPA?

It is ideal for scaling stateless applications and can also be used for stateful ones. Combined with Cluster Autoscaler, it maximizes cost savings for variable workloads by reducing node count when pod numbers drop.

How HPA works

HPA monitors pod metrics, computes the average, and decides whether to add or remove replicas to bring the average closer to the target (e.g., a target CPU utilization of 50%).

HPA best practices

Install metrics-server in the cluster to provide pod resource metrics.

Configure resource requests for each container; missing values lead to inaccurate scaling decisions.

Use custom metrics (pod or object metrics) and ensure the correct target type; external metrics from third‑party monitoring systems are also supported.

2. Pod Vertical Autoscaling (VPA)

Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests for containers based on usage, allowing appropriate scheduling on nodes.

VPA increases or decreases pod resource requests to align allocated cluster resources with actual consumption. It requires access to the Kubernetes metrics server.

If your HPA configuration does not use CPU or memory as scaling targets, use VPA together with HPA.

When to use VPA?

When workloads experience occasional high utilization but constantly increasing request limits would waste resources, VPA helps by recommending optimal requests without over‑provisioning.

How VPA works

VPA consists of three components:

Recommender: monitors usage, calculates target values, and recommends ideal resource requests.

Updater: checks whether pod resource limits need updating.

Admission Controller: overrides resource requests when pods are created.

Since Kubernetes cannot change resource limits of running pods, VPA first terminates old pods and injects the updated values into new pod specifications.

VPA best practices

Avoid using VPA on Kubernetes versions prior to 1.11.

Run VPA with updateMode: Off initially to observe resource usage and obtain recommendations.

If workloads have frequent high‑low usage spikes, HPA may be more suitable than VPA.

3. Cluster Autoscaler (CA)

Cluster Autoscaler adjusts the number of nodes in a Kubernetes cluster by adding or removing nodes, optimizing cluster utilization and cost.

When to use Cluster Autoscaler?

When you want to dynamically scale node count to maximize utilization and meet fluctuating workload demands, CA is an effective tool.

How Cluster Autoscaler works

It identifies unschedulable pods, determines if they can be consolidated onto fewer nodes, evicts pods from underutilized nodes, and then deletes those nodes.

Cluster Autoscaler best practices

Deploy CA with a compatible Kubernetes version.

Ensure all cluster nodes have identical CPU and memory capacities.

Make sure all autoscaled pods have defined resource requests.

Conclusion

Autoscaling mechanisms are valuable for controlling cloud costs but require substantial manual configuration.

Prevent HPA and VPA conflicts by reviewing overlapping policies.

Balance the three mechanisms to support peak loads while minimizing cost during low usage.

Reference: https://www.kubernetes.org.cn/9443.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes autoscaling Cost Optimization HPA VPA Cluster Autoscaler

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.