Cloud Native 17 min read

Understanding Horizontal Pod Autoscaler (HPA) and KEDA for Elastic Scaling in Kubernetes

This article explains pod‑level elasticity in Kubernetes by detailing the principles, metric types, and limitations of the Horizontal Pod Autoscaler (HPA) and then introduces KEDA as an event‑driven extension that adds true scale‑to‑zero capabilities, complete with configuration examples and code snippets.

政采云技术
政采云技术
政采云技术
Understanding Horizontal Pod Autoscaler (HPA) and KEDA for Elastic Scaling in Kubernetes

Introduction

Traditional elasticity deals with capacity planning vs actual load. In cloud‑native Kubernetes, both node‑level and pod‑level scaling are essential. This article focuses on pod‑level scaling, introducing Horizontal Pod Autoscaler (HPA) and KEDA.

1. HPA Implementation Principles

1.1 What is HPA

HPA (Horizontal Pod Autoscaler) works for Deployments, StatefulSets, etc., but not for objects that cannot be scaled such as DaemonSets. It has evolved through autoscaling/v1, v2beta1, v2beta2, supporting four metric types: Resource, Object, External, Pods.

Check supported API versions with kubectl api-versions | grep autoscal

Metric type description:

Resource: CPU/Memory utilization or average value.

# Resource metric example
- type: Resource
  resource:
    name: cpu
  target:
    type: Utilization
    averageUtilization: 50

Object: metrics from external adapters, supports Value and AverageValue.

# Object metric example
- type: Object
  object:
    metric:
      name: requests-per-second
    describedObject:
      apiVersion: networking.k8s.io/v1beta1
      kind: Ingress
      name: main-route
  target:
    type: Value
    value: 10k

Pods: metrics of the pods themselves, only AverageValue.

# Pods metric example
- type: Pods
  pods:
    metric:
      name: packets-per-second
  target:
    type: AverageValue
    averageValue: 1k

External: metrics from outside the cluster, supports Value and AverageValue.

# External metric example
- type: External
  external:
    metric:
      name: queue_messages_ready
    selector:
      matchLabels:
        env: "stage"
        app: "myapp"
  target:
    type: AverageValue
    averageValue: 30

1.2 HPA Working Principle

Prerequisites: define resource requests and install metrics‑server. Workflow: create HPA, collect CPU usage per pod, compute average, compare with target, calculate desired replicas, enforce min/max limits, repeat periodically (default 30 s). The scaling formula is desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)] .

Algorithm explanation: if current metric is double the target, replicas double; if half, replicas halve.

Source code snippets for request extraction and utilization calculation are provided.

// calculatePodRequests extracts pod resource requests
func calculatePodRequests(pods []*v1.Pod, resource v1.ResourceName) (map[string]int64, error) {
    // implementation...
}
// GetResourceUtilizationRatio computes utilization ratio
func GetResourceUtilizationRatio(metrics PodMetricsInfo, requests map[string]int64, targetUtilization int32) (float64, int32, int64, error) {
    // implementation...
}

1.3 Simple Scaling Example

Deploy an nginx service and an HPA that scales when CPU usage exceeds 30 %.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hpa
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-hpa
  template:
    metadata:
      labels:
        app: nginx-hpa
    spec:
      containers:
      - name: nginx-hpa
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-hpa
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx-hpa
---
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30

Load testing can be performed with ab command.

yum install httpd -y

for i in {1..600}
do
    ab -c 1000 -n 100000000 http://ServiceIP/
    sleep
done

2. Limitations of HPA

HPA cannot scale pods to zero, uses request‑based utilization, has issues with multi‑container pods, and suffers from a single‑threaded controller performance bottleneck.

3. Introduction to KEDA

3.1 What is KEDA and its relation to HPA

KEDA (Kubernetes Event‑Driven Autoscaling) adds event‑driven scaling and true scale‑to‑zero capability to Kubernetes. It works together with HPA: KEDA provides external metrics and can trigger HPA, while handling zero‑scale scenarios.

3.2 KEDA Architecture

Core components: Metrics Adapter, HPA Controller, Scalers. Scalers fetch external metrics (e.g., Prometheus, RabbitMQ) and expose them to HPA.

Source code shows how KEDA scales from zero and back.

// Scale from zero when any scaler is active
if currentScale.Spec.Replicas == 0 && isActive {
    e.scaleFromZero(...)
} else if !isActive && currentScale.Spec.Replicas > 0 && (scaledObject.Spec.MinReplicaCount == nil || *scaledObject.Spec.MinReplicaCount == 0) {
    e.scaleToZero(...)
}
// ... other cases omitted

3.3 KEDA Configuration Example

Deploy KEDA operator and define a ScaledObject that references the nginx deployment.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nginx-scaledobject
  namespace: hpa-tmp
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
          - periodSeconds: 30
            type: Pods
            value: 1
          stabilizationWindowSeconds: 30
        scaleUp:
          policies:
          - periodSeconds: 10
            type: Pods
            value: 1
          stabilizationWindowSeconds: 0
  cooldownPeriod: 30
  maxReplicaCount: 3
  minReplicaCount: 1
  pollingInterval: 15
  scaleTargetRef:
    name: nginx-hpa
  triggers:
  - type: cpu
    metadata:
      type: Utilization
      value: "30"

Additional trigger examples for Prometheus, metrics‑server, and cron are shown.

4. Use Cases and Further Exploration

Typical scenarios include batch data extraction with scheduled scaling, event‑driven workloads that can scale to zero, parallel processing jobs with controlled concurrency, fine‑grained scaling speed control, and cost‑optimized traffic shaping.

References

Links to official Kubernetes HPA documentation and KEDA repositories.

cloud-nativeKubernetesDevOpsHPAkeda
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.