Mastering Kubernetes Horizontal Pod Autoscaler: Setup, Metrics Server, and Multi‑Metric Scaling
This guide walks through the fundamentals of Kubernetes Horizontal Pod Autoscaler (HPA), explains custom and external metrics, shows how to deploy and configure the metrics‑server, and provides step‑by‑step examples for scaling a PHP‑Apache deployment and an Nginx pod using CPU, memory, and custom metrics.
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a ReplicationController, Deployment, or ReplicaSet based on observed metrics such as CPU utilization, custom metrics, object metrics, or external metrics. HPA operates as a control loop driven by the --horizontal-pod-autoscaler-sync-period flag (default 15 s) in the controller manager.
Custom Metrics
Custom metrics can be used in addition to resource metrics. The design proposal is available at
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md. The official walkthrough is at
https://v1-17.docs.kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/.
Metrics Server Deployment
The metrics‑server aggregates resource usage data (CPU, memory, file descriptors, etc.) and exposes it via the metrics.k8s.io API, which HPA consumes. Deploy it on the master node using the following images:
k8s.gcr.io/metrics-server-amd64:v0.3.6
k8s.gcr.io/addon-resizer:1.8.4If external network access is unavailable, load the images manually:
docker load -i metrics-server-amd64_0_3_1.tar.gz
docker load -i addon.tar.gzApply the manifest metrics.yaml (see below) and verify the pods are running:
kubectl apply -f metrics.yaml
kubectl get pods -n kube-systemAfter a successful deployment, kubectl top nodes and kubectl top pods -n kube-system will display resource usage.
HPA Workflow
HPA queries the resource metrics API (for CPU, memory, etc.) and the custom metrics API. For each pod, it calculates the average utilization or uses raw values, then determines the desired replica count. DaemonSets are excluded because they cannot be scaled.
Example 1: CPU‑Based Autoscaling of a PHP‑Apache Service
1. Build a Docker image:
FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php2. Push the image to the cluster (or load it manually) and create php-apache.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example:v1
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
ports:
- port: 80
selector:
run: php-apacheDeploy and verify:
kubectl apply -f php-apache.yaml
kubectl get pods3. Create an HPA that keeps CPU usage around 50 % and replica count between 1 and 10:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=104. Generate load with a busybox pod:
kubectl run load -it --image=busybox -- /bin/sh
while true; do wget -q -O- http://php-apache.default.svc.cluster.local; doneObserve scaling with kubectl get hpa and kubectl get deployment php-apache. When the load stops, the HPA scales the replica count back down to 1.
Example 2: Memory‑Based Autoscaling of an Nginx Pod (autoscaling/v2beta1)
Create nginx.yaml with resource requests and limits for CPU and memory, then apply it: kubectl apply -f nginx.yaml Define an HPA that targets 60 % memory utilization:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-hpa
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 60Apply the HPA and generate memory pressure (e.g., dd if=/dev/zero of=/tmp/a inside the pod). The HPA scales the replica count up; removing the file causes the replica count to shrink back.
Multi‑Metric and Custom Metric Autoscaling (autoscaling/v2beta2)
Using the autoscaling/v2beta2 API, you can combine resource, pod, object, and external metrics. Example YAML (simplified):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
targetAverageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10kThe controller evaluates each metric, computes a replica suggestion, and selects the highest value.
External Metrics
External metrics allow scaling based on data outside the cluster. Example configuration:
- type: External
external:
metric:
name: queue_messages_ready
selector: "queue=worker_tasks"
target:
type: AverageValue
averageValue: 30External metrics behave like custom metrics but require careful security considerations.
Verification
After each scaling experiment, use kubectl get hpa, kubectl get deployment, and kubectl get pods to confirm the replica count matches the observed metric values. Scaling may take a few minutes to stabilize.
This comprehensive walkthrough demonstrates how to install the metrics‑server, configure HPA with various metric types, and validate automatic scaling behavior in a Kubernetes cluster.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
