Cloud Native 18 min read

Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies

This article explains how Knative implements request‑based autoscaling with KPA, details the scale‑to‑zero mechanism, shows how to handle burst traffic using stable and panic windows, and demonstrates advanced extensions such as resource pools, precise MPA scaling, and predictive AHPA configurations with concrete YAML examples.

Alibaba Cloud Native

Sep 3, 2023

Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies

Overview of Knative Autoscaling

Knative is an open‑source serverless framework built on Kubernetes that provides request‑based autoscaling (KPA), support for the native Kubernetes Horizontal Pod Autoscaler (HPA), and advanced plugins such as Advanced HPA (AHPA) and MPA for precise scaling.

Request‑Based Autoscaling (KPA)

Knative injects a queue‑proxy sidecar into each pod to collect concurrency or RPS metrics. The Autoscaler periodically reads these metrics and adjusts the number of pods according to a formula that uses the target utilization percentage.

POD_COUNT = total_requests / (max_concurrency * target_utilization)

For example, with 100 concurrent requests, a pod max concurrency of 10, and a target utilization of 0.7, the Autoscaler creates 15 pods (100 / (10 × 0.7) ≈ 15).

Scale‑to‑Zero Mechanism

When traffic drops to zero, KPA automatically scales the workload down to zero pods. The transition between zero and active traffic is managed by switching between two request‑access modes:

Proxy mode : requests pass through the activator component, which buffers traffic and notifies the Autoscaler.

Serve mode : requests are routed directly to pods, bypassing the activator.

The Autoscaler toggles the mode based on current traffic, enabling instant scale‑to‑zero and rapid re‑activation.

Handling Burst Traffic

KPA uses two windows to react to spikes:

Stable window : default 60 s, used to compute the average concurrency.

Panic window : calculated as stable_window × panic-window-percentage (default 0.1 → 6 s). The Autoscaler compares the pod count derived from the panic window with a panic threshold (default 2× the current ready pods). If the panic count exceeds the threshold, the panic mode pod count is applied.

This design ensures rapid scaling during sudden traffic bursts while allowing sensitivity tuning via the configurable parameters.

Configuration Options

container-concurrency-target-default : default maximum concurrency per pod (default 100).

target-utilization-percentage : desired utilization of the concurrency target (default 0.7).

stable-window : length of the stable observation period (default 60 s).

panic-window-percentage : proportion of the stable window used for the panic window (default 0.1 → 6 s).

panic-threshold-percentage : multiplier for the panic threshold (default 200 % → 2×).

scale-to-zero-grace-period : delay before scaling to zero (default 30 s).

scale-to-zero-pod-retention-period : how long a zero‑scaled pod is kept before termination (e.g., 1m5s).

Global vs Revision Configuration

Cluster‑wide settings are stored in the config‑autoscaler ConfigMap. Example command to view it:

kubectl -n knative-serving get cm config-autoscaler

Individual revisions can override autoscaling parameters via annotations on the Service resource:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
  annotations:
    autoscaling.knative.dev/metric: "concurrency"
    autoscaling.knative.dev/target: "50"
    autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1m5s"
    autoscaling.knative.dev/target-utilization-percentage: "80"

Plugin Extensions

Knative’s pod‑autoscaler‑class field enables different scaling strategies:

HPA : CPU or memory‑based scaling using native Kubernetes HPA.

MPA : Precise concurrency‑based scaling integrated with Alibaba Cloud MSE gateway.

AHPA : Predictive scaling based on historical metrics, supporting custom metrics such as RPS and response time.

Example of MPA configuration (precise concurrency control):

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations:
    autoscaling.knative.dev/class: mpa.autoscaling.knative.dev
    autoscaling.knative.dev/max-scale: "20"
spec:
  template:
    spec:
      containerConcurrency: 5
      containers:
      - image: registry-vpc.cn-beijing.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
        env:
        - name: TARGET
          value: "Knative"

AHPA Predictive Autoscaling

AHPA uses historical request patterns to forecast capacity needs, reducing latency caused by scaling lag. It can also consume custom metrics such as message‑queue depth or response time.

Sample AHPA configuration for RPS‑based scaling:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations:
    autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev
    autoscaling.knative.dev/metric: "rps"
    autoscaling.knative.dev/target: "10"
    autoscaling.knative.dev/minScale: "1"
    autoscaling.knative.dev/maxScale: "30"
    autoscaling.alibabacloud.com/scaleStrategy: "observer"
spec:
  template:
    spec:
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/autoscale-go:0.1

Resource Pool Feature

Knative can reserve a pool of pre‑warmed pods (e.g., ECS or ECI instances) to handle baseline traffic or to pre‑heat resources for burst scenarios, reducing cold‑start latency and cost.

Summary

Knative’s KPA delivers request‑driven autoscaling, seamless scale‑to‑zero, and robust burst handling. By leveraging global ConfigMap settings or per‑revision annotations, operators can fine‑tune parameters such as target utilization, window periods, and scaling limits. Extensions like MPA, AHPA, and the resource‑pool mechanism further enhance precision, predictive capability, and cost efficiency for cloud‑native workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud native Serverless Kubernetes autoscaling Knative KPA

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.