How Does Kubernetes HPA Really Scale Pods? Deep Dive into Principles and Evolution
This article explains the core principles of Kubernetes Horizontal Pod Autoscaler, walks through a concrete scaling example, discusses noise handling, cooldown periods, boundary calculations, and traces the evolution of HPA across API versions with practical YAML snippets.
HPA Basic Principles
HPA (Horizontal Pod Autoscaler) adjusts the number of pod replicas based on actual workload metrics, primarily CPU utilization. The scaling decision follows a simple formula that compares the average pod utilization against a target percentage.
Example: a Deployment A with three pods, each requesting 1 CPU core. The pods report CPU utilizations of 60%, 70%, and 80%. The HPA is configured with a target CPU utilization of 50%, a minimum of 3 replicas, and a maximum of 10.
Total pod utilization = 60% + 70% + 80% = 210%.
Current target replica count = 3.
Calculated ratio = 210% / (3 × 50%) = 70%, which exceeds the 50% threshold, so more replicas are needed.
Setting the target to 5 replicas yields a new ratio of 42%, still below the threshold, indicating that two additional pods are required.
Thus HPA sets Replicas to 5 and performs a horizontal pod scale‑out.
In practice the final replica count may be 6 instead of 5 because HPA applies additional adjustments such as noise handling, cooldown periods, and boundary value calculations.
1. Noise Handling
During pod creation ( Starting) or termination ( Stopping) the pod’s metrics can introduce large spikes. HPA skips the calculation for pods in these states and waits until they reach Running before evaluating the scaling formula.
2. Cooldown Period
To avoid rapid oscillations, HPA enforces a default scaling cooldown: 3 minutes for scale‑out and 5 minutes for scale‑in.
3. Boundary Value Calculation
HPA adds a 10% buffer (△) to the target calculation to account for the resource consumption of newly started pods. This buffer explains why the example ultimately yields 6 replicas instead of 5.
HPA Evolution
HPA has progressed through three major API versions: autoscaling/v1 – supports only CPU‑based scaling. autoscaling/v1beta1 and autoscaling/v1beta2 – introduce additional metric types and more complex specifications. autoscaling/v2beta1 – adds support for Resource and Custom metrics. autoscaling/v2beta2 – further adds External metrics.
Typical YAML for the v1 API:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50YAML for v2beta1/v2beta2 demonstrates the richer metrics block, allowing resource, pod‑level custom, and external metrics.
Metrics Types and APIs
The three metric categories are:
Resource – accessed via metrics.k8s.io, e.g., CPU or memory per pod.
Custom – accessed via custom.metrics.k8s.io, e.g., application‑specific counters.
External – accessed via external.metrics.k8s.io, e.g., cloud provider metrics.
Metrics Server vs. Heapster
Early Kubernetes used Heapster as the sole monitoring component, which collected metrics from the kubelet and provided offline archiving. Limitations such as fragmented sink maintenance, lack of custom metrics, and competition from Prometheus led to the deprecation of Heapster. The community introduced Metrics Server, a lightweight component focused on Resource metrics, with a simplified architecture that removes the sink mechanism and registers standard APIs.
HPA is now in GA (General Availability). Future work in the community focuses on fine‑tuning configuration parameters and expanding adapter implementations for custom and external metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
