Cloud Native 11 min read

Understanding Kubernetes Horizontal Pod Autoscaler (HPA): Mechanism, Core Source Code, and Practical Insights

This article explains how Kubernetes Horizontal Pod Autoscaler (HPA) balances resource demand and workload by automatically scaling pod replicas, describes the different metric types it supports, walks through the core controller code (Run, worker, reconcile, and replica calculation), highlights current limitations, and shares practical observations from real‑world usage.

HomeTech
HomeTech
HomeTech
Understanding Kubernetes Horizontal Pod Autoscaler (HPA): Mechanism, Core Source Code, and Practical Insights

Application resource usage often exhibits peaks and troughs; to improve overall cluster utilization, the number of Pods can be adjusted automatically using Horizontal Pod Autoscaler (HPA), which performs horizontal scaling based on defined target metrics.

HPA (Horizontal Pod Autoscaler) is the most widely used autoscaling mechanism in Kubernetes. It solves the supply‑demand imbalance between resources and business load by monitoring metrics and adjusting the replica count of the target Deployment.

Metrics supported by HPA include:

Resource : system resources such as CPU (commonly used) and memory.

Pods : custom metrics exposed by the Pods themselves (e.g., Prometheus QPS endpoint).

Object : metrics obtained from other Kubernetes objects like Ingress.

External : metrics sourced from external systems unrelated to Kubernetes.

The core HPA controller logic (Kubernetes 1.13) can be illustrated with the following snippets:

func (a *HorizontalController) Run(stopCh <-chan struct{}) {
    // start a single worker (may be expanded later)
    go wait.Until(a.worker, time.Second, stopCh)
    <-stopCh
}

func (a *HorizontalController) worker() {
    for a.processNextWorkItem() {
    }
    klog.Infof("horizontal pod autoscaler controller worker shutting down")
}

func (a *HorizontalController) processNextWorkItem() bool {
    key, quit := a.queue.Get()
    if quit {
        return false
    }
    defer a.queue.Done(key)
    deleted, err := a.reconcileKey(key.(string))
    if err != nil {
        utilruntime.HandleError(err)
    }
    // ...
    return true
}

func (a *HorizontalController) reconcileKey(key string) (deleted bool, err error) {
    // retrieve HPA object
    hpa, err := a.hpaLister.HorizontalPodAutoscalers(namespace).Get(name)
    // ...
    return false, a.reconcileAutoscaler(hpa, key)
}

func (a *HorizontalController) reconcileAutoscaler(hpav1Shared *autoscalingv1.HorizontalPodAutoscaler, key string) error {
    // convert to v2, obtain Scale, compare replicas, compute desired replicas
    // ...
    if rescale {
        scale.Spec.Replicas = desiredReplicas
        _, err = a.scaleNamespacer.Scales(hpa.Namespace).Update(targetGR, scale)
        // ...
    } else {
        klog.V(4).Infof("decided not to scale %s to %v (last scale time was %s)", reference, desiredReplicas, hpa.Status.LastScaleTime)
        desiredReplicas = currentReplicas
    }
    a.setStatus(hpa, currentReplicas, desiredReplicas, metricStatuses, rescale)
    return a.updateStatusIfNeeded(hpaStatusOriginal, hpa)
}

// Core replica calculation (simplified)
// desiredReplicas = ceil[currentReplicas * (currentMetric / targetMetric)]

The controller fetches the HPA object, reads the current replica count via the Scale API, evaluates each configured metric, and determines the desired replica count. The algorithm multiplies the current replica number by the ratio of the observed metric to the target metric, then applies ceil to obtain an integer value.

Known issues include:

Only a single worker goroutine is started, which can become a performance bottleneck in large clusters with frequent scaling events.

HPA currently supports only request‑based resource metrics; using limit‑based metrics is not yet possible.

In practice at the "Home Cloud" platform, over 100 applications have HPA enabled, but many still use sub‑optimal replica settings. Ongoing work records each scaling event to evaluate resource savings and guide further optimization.

In summary, HPA provides a native Kubernetes solution for automatic horizontal scaling that satisfies most scenarios, though extreme workloads may require additional tuning or architectural changes.

References:

https://kubernetes.io/zh/docs/tasks/run-application/horizontal-pod-autoscale/

https://github.com/kubernetes/kubernetes/tree/master/pkg/controller/podautoscaler

cloud-nativekubernetesGometricsautoscalingHorizontal Pod Autoscaler
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.