Cloud Native 6 min read

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

This article reviews the current state of Kubernetes horizontal and vertical autoscaling, compares HPA, VPA, and KNative KPA, discusses their limitations, and proposes short‑ and long‑term ideas for a more dynamic, low‑ops scheduling system.

Node Underground

Aug 21, 2021

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

Previously I roughly researched and learned some automatic elasticity capabilities on top of K8S/KNative. This article is finally being published now; it may be a bit outdated, please forgive. Overall, scaling based on resource‑type metrics is relatively mature; for serverless scenarios KNative/KPA is still quite basic, and vertical scaling (VPA) is a complex problem unlikely to be applied soon.

K8S/HPA

K8S/HPA can perform scaling based on two types of metrics:

Resource metrics, based on metrics.k8s.io API, usually provided by metrics‑server.

Custom metrics, based on custom.metrics.k8s.io API. Adapters are provided to connect various custom metrics, such as Prometheus.

According to these two metric types, a periodic (default 15 seconds) loop evaluates scaling decisions and executes them; the algorithm looks fairly simple. It also supports multiple metrics to decide scaling, and HPA picks the largest replica count. Currently HPA also supports scaling to zero via custom metrics.

KNative/KPA

My understanding is that KNative’s KPA is also a form of horizontal scaling, but because it must handle 0‑1 and rapid burst traffic scenarios, it has special requirements such as low latency. There is a special Activator component for 0‑1 scenarios; when scaling to zero, routing points to the Activator, which then triggers the 0‑1 transition.

Based on low‑latency data paths, a stable and panic level statistical window is built to quickly respond to traffic changes; see the official documentation.

Unfortunately, each Pod’s concurrency still needs manual evaluation; for large‑scale use, concurrency, limits/requests still require manual assessment, which is quite serverful.

Additionally, the concurrency metric collected by KNative can now be used as a custom metric in HPA, together with CPU etc., and HPA will choose the largest replica count for scaling.

K8S/VPA

K8S/VPA is still in beta, functionality is not very complete, and it cannot cooperate with HPA (resource metrics). Generally VPA recommends suitable resource limits/requests based on historical data (e.g., past week). Currently it seems only suitable for CPU‑bound or batch workloads; online services cannot use it now because adjusting requires pod recreation.

Still Some Serverful Ideas

From the above we see that in the short term we may only target specific scenarios (e.g., Taobao guide business functions), configuring various strategies to approximate serverless behavior. In the future we hope for a closed‑loop scheduling system covering all scenarios to truly reduce operational burden for ordinary users.

I think the core issue is the dynamic balance between performance (or throughput per unit) and resources (overall utilization), while also considering isolation and workload characteristics, which is quite complex.

From a short‑term perspective I have two immature ideas:

Users only care about concurrency; the scheduler dynamically adjusts limits/requests, similar to the current KNative‑VPA integration concept (though still early and unrealistic).

Users only care about limits/requests; the scheduler dynamically adjusts concurrency to keep CPU at a certain level, perhaps like PID control, but convergence is an issue.

In the long term I hope for a powerful scheduler that does not require specifying limits/requests or concurrency, but can dynamically adjust performance and resources for optimal utilization. Currently, a workload’s performance profile is an unavoidable prerequisite and a necessary compromise. Efficiently obtaining this profile is the core of low‑ops efficiency.

cloud-native serverless Kubernetes autoscaling Knative HPA VPA

Written by

Node Underground

No language is immortal—Node.js isn’t either—but thoughtful reflection is priceless. This underground community for Node.js enthusiasts was started by Taobao’s Front‑End Team (FED) to share our original insights and viewpoints from working with Node.js. Follow us. BTW, we’re hiring.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.