Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies
This article explains how Knative implements request‑based autoscaling with KPA, details the scale‑to‑zero mechanism, shows how to handle burst traffic using stable and panic windows, and demonstrates advanced extensions such as resource pools, precise MPA scaling, and predictive AHPA configurations with concrete YAML examples.
Overview of Knative Autoscaling
Knative is an open‑source serverless framework built on Kubernetes that provides request‑based autoscaling (KPA), support for the native Kubernetes Horizontal Pod Autoscaler (HPA), and advanced plugins such as Advanced HPA (AHPA) and MPA for precise scaling.
Request‑Based Autoscaling (KPA)
Knative injects a queue‑proxy sidecar into each pod to collect concurrency or RPS metrics. The Autoscaler periodically reads these metrics and adjusts the number of pods according to a formula that uses the target utilization percentage.
POD_COUNT = total_requests / (max_concurrency * target_utilization)For example, with 100 concurrent requests, a pod max concurrency of 10, and a target utilization of 0.7, the Autoscaler creates 15 pods (100 / (10 × 0.7) ≈ 15).
Scale‑to‑Zero Mechanism
When traffic drops to zero, KPA automatically scales the workload down to zero pods. The transition between zero and active traffic is managed by switching between two request‑access modes:
Proxy mode : requests pass through the activator component, which buffers traffic and notifies the Autoscaler.
Serve mode : requests are routed directly to pods, bypassing the activator.
The Autoscaler toggles the mode based on current traffic, enabling instant scale‑to‑zero and rapid re‑activation.
Handling Burst Traffic
KPA uses two windows to react to spikes:
Stable window : default 60 s, used to compute the average concurrency.
Panic window : calculated as stable_window × panic-window-percentage (default 0.1 → 6 s). The Autoscaler compares the pod count derived from the panic window with a panic threshold (default 2× the current ready pods). If the panic count exceeds the threshold, the panic mode pod count is applied.
This design ensures rapid scaling during sudden traffic bursts while allowing sensitivity tuning via the configurable parameters.
Configuration Options
container-concurrency-target-default : default maximum concurrency per pod (default 100).
target-utilization-percentage : desired utilization of the concurrency target (default 0.7).
stable-window : length of the stable observation period (default 60 s).
panic-window-percentage : proportion of the stable window used for the panic window (default 0.1 → 6 s).
panic-threshold-percentage : multiplier for the panic threshold (default 200 % → 2×).
scale-to-zero-grace-period : delay before scaling to zero (default 30 s).
scale-to-zero-pod-retention-period : how long a zero‑scaled pod is kept before termination (e.g., 1m5s).
Global vs Revision Configuration
Cluster‑wide settings are stored in the config‑autoscaler ConfigMap. Example command to view it:
kubectl -n knative-serving get cm config-autoscalerIndividual revisions can override autoscaling parameters via annotations on the Service resource:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
annotations:
autoscaling.knative.dev/metric: "concurrency"
autoscaling.knative.dev/target: "50"
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1m5s"
autoscaling.knative.dev/target-utilization-percentage: "80"Plugin Extensions
Knative’s pod‑autoscaler‑class field enables different scaling strategies:
HPA : CPU or memory‑based scaling using native Kubernetes HPA.
MPA : Precise concurrency‑based scaling integrated with Alibaba Cloud MSE gateway.
AHPA : Predictive scaling based on historical metrics, supporting custom metrics such as RPS and response time.
Example of MPA configuration (precise concurrency control):
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
annotations:
autoscaling.knative.dev/class: mpa.autoscaling.knative.dev
autoscaling.knative.dev/max-scale: "20"
spec:
template:
spec:
containerConcurrency: 5
containers:
- image: registry-vpc.cn-beijing.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
env:
- name: TARGET
value: "Knative"AHPA Predictive Autoscaling
AHPA uses historical request patterns to forecast capacity needs, reducing latency caused by scaling lag. It can also consume custom metrics such as message‑queue depth or response time.
Sample AHPA configuration for RPS‑based scaling:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
annotations:
autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev
autoscaling.knative.dev/metric: "rps"
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "30"
autoscaling.alibabacloud.com/scaleStrategy: "observer"
spec:
template:
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/autoscale-go:0.1Resource Pool Feature
Knative can reserve a pool of pre‑warmed pods (e.g., ECS or ECI instances) to handle baseline traffic or to pre‑heat resources for burst scenarios, reducing cold‑start latency and cost.
Summary
Knative’s KPA delivers request‑driven autoscaling, seamless scale‑to‑zero, and robust burst handling. By leveraging global ConfigMap settings or per‑revision annotations, operators can fine‑tune parameters such as target utilization, window periods, and scaling limits. Extensions like MPA, AHPA, and the resource‑pool mechanism further enhance precision, predictive capability, and cost efficiency for cloud‑native workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
