Cloud Native 10 min read

Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve

The article examines the limitations of traditional threshold‑based autoscaling in Kubernetes, explains three core challenges—percentage fragmentation, capacity‑planning pitfalls, and resource‑utilization dilemmas—then expands the autoscaling concept across four workload types and outlines the cloud‑native components that address them.

Alibaba Cloud Native

Jul 17, 2019

Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve

Challenges of Traditional Autoscaling

In Kubernetes the classic autoscaling model assumes a static resource‑buffer (usually 15‑30 % of cluster capacity) and scales based on simple utilization thresholds. This model suffers from three well‑known problems:

1. Percentage Fragmentation

Clusters often contain heterogeneous node types (e.g., 4 CPU × 8 GiB and 16 CPU × 32 GiB). A uniform percentage reserve (e.g., 10 %) represents very different absolute amounts on each node size. During scale‑down the algorithm may mistakenly remove large nodes with low utilization, causing resource contention, or it may keep only oversized nodes, wasting capacity.

2. Capacity‑Planning Mismatch

Before containers, capacity was allocated per application (e.g., Application A receives two 4C8G machines). In Kubernetes developers declare requests and limits for Pods, which replaces the traditional per‑application capacity plan. If requests/limits are mis‑configured, small nodes may lack enough reserved resources for scheduling while large nodes remain under‑utilized.

3. Utilization‑Based Misinterpretation

Low Pod CPU/memory usage does not mean the requested resources can be reclaimed, and high overall cluster utilization can hide scheduling bottlenecks. Pods stuck on overloaded nodes cannot be automatically migrated without manual eviction, making the cluster appear “full” even when spare capacity exists.

Extending the Autoscaling Concept

Workloads can be classified into four categories, each requiring a different scaling strategy:

Online tasks : latency‑sensitive services (web, API, micro‑services) that need high CPU, memory, I/O and cannot tolerate downtime.

Offline tasks : batch or edge‑computing jobs where reliability and latency are less critical; the primary goal is cost reduction.

Scheduled tasks : periodic batch jobs where predictable timing and cost efficiency are the main concerns.

Special tasks : idle‑time computation, IoT, grid or high‑performance computing that aim for maximal resource utilization.

Pure utilization‑based autoscaling works well for online tasks but is unsuitable for the other three categories, which need cost‑oriented or time‑oriented scaling policies.

Kubernetes Autoscaling Architecture

Kubernetes separates autoscaling into two orthogonal layers:

Ensuring the application load stays within the capacity defined by requests / limits (load‑capacity planning).

Adjusting the size of the underlying resource pool when the cluster cannot satisfy the defined capacity (resource‑pool scaling).

These layers are implemented by a set of decoupled, composable components:

cluster‑autoscaler – node‑level horizontal scaling; GA (General Availability).

Horizontal Pod Autoscaler (HPA) – scales Pods horizontally based on CPU (v1) or custom/external metrics (v2beta1, v2beta2).

cluster‑proportional‑autoscaler – adjusts the number of Pods proportionally to the node count; GA.

vertical‑pod‑autoscaler (VPA) – updates Pod requests based on observed utilization and historical trends; beta.

addon‑resizer – vertically scales workload requests according to the total node count; beta.

For the majority of workloads (≈ 80 % of typical scenarios) a combination of HPA for load‑capacity planning and cluster‑autoscaler for resource‑pool adjustments provides a robust, two‑dimensional autoscaling solution. HPA reacts quickly to traffic spikes, while cluster‑autoscaler expands or contracts the node pool when the scheduler cannot place new Pods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Resource Management HPA cluster-autoscaler

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.