Cloud Native 10 min read

How Does Kubernetes HPA Really Scale Pods? Deep Dive into Principles and Evolution

This article explains the core principles of Kubernetes Horizontal Pod Autoscaler, walks through a concrete scaling example, discusses noise handling, cooldown periods, boundary calculations, and traces the evolution of HPA across API versions with practical YAML snippets.

Alibaba Cloud Native

Jul 24, 2019

How Does Kubernetes HPA Really Scale Pods? Deep Dive into Principles and Evolution

HPA Basic Principles

HPA (Horizontal Pod Autoscaler) adjusts the number of pod replicas based on actual workload metrics, primarily CPU utilization. The scaling decision follows a simple formula that compares the average pod utilization against a target percentage.

Example: a Deployment A with three pods, each requesting 1 CPU core. The pods report CPU utilizations of 60%, 70%, and 80%. The HPA is configured with a target CPU utilization of 50%, a minimum of 3 replicas, and a maximum of 10.

Total pod utilization = 60% + 70% + 80% = 210%.

Current target replica count = 3.

Calculated ratio = 210% / (3 × 50%) = 70%, which exceeds the 50% threshold, so more replicas are needed.

Setting the target to 5 replicas yields a new ratio of 42%, still below the threshold, indicating that two additional pods are required.

Thus HPA sets Replicas to 5 and performs a horizontal pod scale‑out.

In practice the final replica count may be 6 instead of 5 because HPA applies additional adjustments such as noise handling, cooldown periods, and boundary value calculations.

1. Noise Handling

During pod creation ( Starting) or termination ( Stopping) the pod’s metrics can introduce large spikes. HPA skips the calculation for pods in these states and waits until they reach Running before evaluating the scaling formula.

2. Cooldown Period

To avoid rapid oscillations, HPA enforces a default scaling cooldown: 3 minutes for scale‑out and 5 minutes for scale‑in.

3. Boundary Value Calculation

HPA adds a 10% buffer (△) to the target calculation to account for the resource consumption of newly started pods. This buffer explains why the example ultimately yields 6 replicas instead of 5.

HPA Evolution

HPA has progressed through three major API versions: autoscaling/v1 – supports only CPU‑based scaling. autoscaling/v1beta1 and autoscaling/v1beta2 – introduce additional metric types and more complex specifications. autoscaling/v2beta1 – adds support for Resource and Custom metrics. autoscaling/v2beta2 – further adds External metrics.

Typical YAML for the v1 API:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

YAML for v2beta1/v2beta2 demonstrates the richer metrics block, allowing resource, pod‑level custom, and external metrics.

Metrics Types and APIs

The three metric categories are:

Resource – accessed via metrics.k8s.io, e.g., CPU or memory per pod.

Custom – accessed via custom.metrics.k8s.io, e.g., application‑specific counters.

External – accessed via external.metrics.k8s.io, e.g., cloud provider metrics.

Metrics Server vs. Heapster

Early Kubernetes used Heapster as the sole monitoring component, which collected metrics from the kubelet and provided offline archiving. Limitations such as fragmented sink maintenance, lack of custom metrics, and competition from Prometheus led to the deprecation of Heapster. The community introduced Metrics Server, a lightweight component focused on Resource metrics, with a simplified architecture that removes the sink mechanism and registers standard APIs.

HPA is now in GA (General Availability). Future work in the community focuses on fine‑tuning configuration parameters and expanding adapter implementations for custom and external metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes HPA Horizontal Pod Autoscaler

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.