Cloud Native 15 min read

Master Kubernetes HPA: Hands‑On autoscaling/v1 and autoscaling/v2beta1 Practices

This guide walks you through configuring Kubernetes Horizontal Pod Autoscaler using both autoscaling/v1 (CPU‑only) and autoscaling/v2beta1 (custom metrics), covering template creation, deployment, Metrics Server migration, custom metrics adapter setup, load testing, and verification of scaling behavior.

Alibaba Cloud Native

Jul 31, 2019

Master Kubernetes HPA: Hands‑On autoscaling/v1 and autoscaling/v2beta1 Practices

The article continues the series on Kubernetes elastic scaling by showing how to use Horizontal Pod Autoscaler (HPA) with the autoscaling/v1 and autoscaling/v2beta1 API versions.

Why autoscaling/v1 only supports CPU

Early HPA designs intended to support both CPU and memory metrics, but memory proved unreliable for scaling because most applications rely on the language runtime’s garbage collector, which can delay memory reclamation and cause oscillations. Therefore, autoscaling/v1 limits scaling to CPU utilization.

autoscaling/v1 example

A minimal autoscaling/v1 HPA template looks like this:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

The scaleTargetRef points to the Deployment to be scaled, and targetCPUUtilizationPercentage triggers scaling when average CPU usage exceeds 50%.

Step‑by‑step deployment

Create an HPA object with the same template (shown in the code block above).

Start a load‑generator pod that continuously requests the service:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: load-generator
  labels:
    app: load-generator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: load-generator
  template:
    metadata:
      labels:
        app: load-generator
    spec:
      containers:
      - name: load-generator
        image: busybox
        command:
        - "sh"
        - "-c"
        - "while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done"

Check the scaling status via the console or kubectl get hpa.

When testing is finished, delete the load‑generator and the HPA.

Verify that the deployment scales back down.

After these steps, a fully functional autoscaling/v1 HPA is in place, independent of whether the Metrics Server is installed.

autoscaling/v2beta1 and custom metrics

The autoscaling/v2beta1 version adds support for Resource Metrics and Custom Metrics . The autoscaling/v2beta2 further adds External Metrics , but the article focuses on v2beta1.

Metrics Server migration

Alibaba Cloud clusters originally use Heapster for metrics. To switch to Metrics Server while keeping compatibility, the article suggests copying Heapster’s startup parameters into the Metrics Server manifest and updating the kube-controller-manager configuration ( --horizontal-pod-autoscaler-use-rest-clients=true) to source metrics from the new server.

Key manifest snippets for the Metrics Server deployment are provided (service account, APIService, deployment, etc.). After applying them, kubectl get apiservice should show v1beta1.metrics.k8s.io as registered.

Deploying the Custom Metrics Adapter

If Prometheus is not yet installed, the article references another guide to set it up. Then it shows the full set of YAML resources needed to deploy the Prometheus‑based Custom Metrics Adapter, including namespace, service account, RBAC bindings, deployment, service, and APIService definitions.

Sample application with custom metric

A demo app exposing a Prometheus endpoint is deployed:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-metrics-app
  labels:
    app: sample-metrics-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-metrics-app
  template:
    metadata:
      labels:
        app: sample-metrics-app
    spec:
      containers:
      - name: sample-metrics-app
        image: luxas/autoscale-demo:v0.1.2
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: "/"
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: "/"
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 5

A corresponding Service, ServiceMonitor, and an HPA using the custom metric http_requests are also defined:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: sample-metrics-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-metrics-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      target:
        kind: Service
        name: sample-metrics-app
      metricName: http_requests
      targetValue: 100

The load‑generator pod repeatedly queries the app’s HTTP endpoint, causing the custom metric to increase. After a few minutes, kubectl get hpa shows the replica count growing according to the defined target.

Cleanup and next steps

The article concludes by summarizing the hands‑on experience with both API versions and notes that a future post will dive deeper into using custom metrics with autoscaling/v2beta1 . The promotional sections about upcoming courses and related links have been omitted.

autoscaling/v2beta1 practice illustration

cloud-native Kubernetes HPA custom metrics metrics-server

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.