Cloud Native 24 min read

Mastering Kubernetes Horizontal Pod Autoscaler: Setup, Metrics Server, and Multi‑Metric Scaling

This guide walks through the fundamentals of Kubernetes Horizontal Pod Autoscaler (HPA), explains custom and external metrics, shows how to deploy and configure the metrics‑server, and provides step‑by‑step examples for scaling a PHP‑Apache deployment and an Nginx pod using CPU, memory, and custom metrics.

Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Mastering Kubernetes Horizontal Pod Autoscaler: Setup, Metrics Server, and Multi‑Metric Scaling

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a ReplicationController, Deployment, or ReplicaSet based on observed metrics such as CPU utilization, custom metrics, object metrics, or external metrics. HPA operates as a control loop driven by the --horizontal-pod-autoscaler-sync-period flag (default 15 s) in the controller manager.

Custom Metrics

Custom metrics can be used in addition to resource metrics. The design proposal is available at

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md

. The official walkthrough is at

https://v1-17.docs.kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

.

Metrics Server Deployment

The metrics‑server aggregates resource usage data (CPU, memory, file descriptors, etc.) and exposes it via the metrics.k8s.io API, which HPA consumes. Deploy it on the master node using the following images:

k8s.gcr.io/metrics-server-amd64:v0.3.6
k8s.gcr.io/addon-resizer:1.8.4

If external network access is unavailable, load the images manually:

docker load -i metrics-server-amd64_0_3_1.tar.gz
docker load -i addon.tar.gz

Apply the manifest metrics.yaml (see below) and verify the pods are running:

kubectl apply -f metrics.yaml
kubectl get pods -n kube-system

After a successful deployment, kubectl top nodes and kubectl top pods -n kube-system will display resource usage.

HPA Workflow

HPA queries the resource metrics API (for CPU, memory, etc.) and the custom metrics API. For each pod, it calculates the average utilization or uses raw values, then determines the desired replica count. DaemonSets are excluded because they cannot be scaled.

Example 1: CPU‑Based Autoscaling of a PHP‑Apache Service

1. Build a Docker image:

FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php

2. Push the image to the cluster (or load it manually) and create php-apache.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example:v1
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

Deploy and verify:

kubectl apply -f php-apache.yaml
kubectl get pods

3. Create an HPA that keeps CPU usage around 50 % and replica count between 1 and 10:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

4. Generate load with a busybox pod:

kubectl run load -it --image=busybox -- /bin/sh
while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Observe scaling with kubectl get hpa and kubectl get deployment php-apache. When the load stops, the HPA scales the replica count back down to 1.

Example 2: Memory‑Based Autoscaling of an Nginx Pod (autoscaling/v2beta1)

Create nginx.yaml with resource requests and limits for CPU and memory, then apply it: kubectl apply -f nginx.yaml Define an HPA that targets 60 % memory utilization:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  metrics:
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 60

Apply the HPA and generate memory pressure (e.g., dd if=/dev/zero of=/tmp/a inside the pod). The HPA scales the replica count up; removing the file causes the replica count to shrink back.

Multi‑Metric and Custom Metric Autoscaling (autoscaling/v2beta2)

Using the autoscaling/v2beta2 API, you can combine resource, pod, object, and external metrics. Example YAML (simplified):

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: packets-per-second
      targetAverageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1beta1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k

The controller evaluates each metric, computes a replica suggestion, and selects the highest value.

External Metrics

External metrics allow scaling based on data outside the cluster. Example configuration:

- type: External
  external:
    metric:
      name: queue_messages_ready
      selector: "queue=worker_tasks"
    target:
      type: AverageValue
      averageValue: 30

External metrics behave like custom metrics but require careful security considerations.

Verification

After each scaling experiment, use kubectl get hpa, kubectl get deployment, and kubectl get pods to confirm the replica count matches the observed metric values. Scaling may take a few minutes to stabilize.

HPA control loop diagram
HPA control loop diagram
Metrics server pod list
Metrics server pod list
HPA status after PHP load test
HPA status after PHP load test
HPA status after load removal
HPA status after load removal
Final PHP deployment replica count
Final PHP deployment replica count
Nginx HPA scaling result
Nginx HPA scaling result

This comprehensive walkthrough demonstrates how to install the metrics‑server, configure HPA with various metric types, and validate automatic scaling behavior in a Kubernetes cluster.

KubernetesHorizontalPodAutoscalermetrics-servercustom-metrics
Full-Stack DevOps & Kubernetes
Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.