Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve
The article examines the limitations of traditional threshold‑based autoscaling in Kubernetes, explains three core challenges—percentage fragmentation, capacity‑planning pitfalls, and resource‑utilization dilemmas—then expands the autoscaling concept across four workload types and outlines the cloud‑native components that address them.
Challenges of Traditional Autoscaling
In Kubernetes the classic autoscaling model assumes a static resource‑buffer (usually 15‑30 % of cluster capacity) and scales based on simple utilization thresholds. This model suffers from three well‑known problems:
1. Percentage Fragmentation
Clusters often contain heterogeneous node types (e.g., 4 CPU × 8 GiB and 16 CPU × 32 GiB). A uniform percentage reserve (e.g., 10 %) represents very different absolute amounts on each node size. During scale‑down the algorithm may mistakenly remove large nodes with low utilization, causing resource contention, or it may keep only oversized nodes, wasting capacity.
2. Capacity‑Planning Mismatch
Before containers, capacity was allocated per application (e.g., Application A receives two 4C8G machines). In Kubernetes developers declare requests and limits for Pods, which replaces the traditional per‑application capacity plan. If requests/limits are mis‑configured, small nodes may lack enough reserved resources for scheduling while large nodes remain under‑utilized.
3. Utilization‑Based Misinterpretation
Low Pod CPU/memory usage does not mean the requested resources can be reclaimed, and high overall cluster utilization can hide scheduling bottlenecks. Pods stuck on overloaded nodes cannot be automatically migrated without manual eviction, making the cluster appear “full” even when spare capacity exists.
Extending the Autoscaling Concept
Workloads can be classified into four categories, each requiring a different scaling strategy:
Online tasks : latency‑sensitive services (web, API, micro‑services) that need high CPU, memory, I/O and cannot tolerate downtime.
Offline tasks : batch or edge‑computing jobs where reliability and latency are less critical; the primary goal is cost reduction.
Scheduled tasks : periodic batch jobs where predictable timing and cost efficiency are the main concerns.
Special tasks : idle‑time computation, IoT, grid or high‑performance computing that aim for maximal resource utilization.
Pure utilization‑based autoscaling works well for online tasks but is unsuitable for the other three categories, which need cost‑oriented or time‑oriented scaling policies.
Kubernetes Autoscaling Architecture
Kubernetes separates autoscaling into two orthogonal layers:
Ensuring the application load stays within the capacity defined by requests / limits (load‑capacity planning).
Adjusting the size of the underlying resource pool when the cluster cannot satisfy the defined capacity (resource‑pool scaling).
These layers are implemented by a set of decoupled, composable components:
cluster‑autoscaler – node‑level horizontal scaling; GA (General Availability).
Horizontal Pod Autoscaler (HPA) – scales Pods horizontally based on CPU (v1) or custom/external metrics (v2beta1, v2beta2).
cluster‑proportional‑autoscaler – adjusts the number of Pods proportionally to the node count; GA.
vertical‑pod‑autoscaler (VPA) – updates Pod requests based on observed utilization and historical trends; beta.
addon‑resizer – vertically scales workload requests according to the total node count; beta.
For the majority of workloads (≈ 80 % of typical scenarios) a combination of HPA for load‑capacity planning and cluster‑autoscaler for resource‑pool adjustments provides a robust, two‑dimensional autoscaling solution. HPA reacts quickly to traffic spikes, while cluster‑autoscaler expands or contracts the node pool when the scheduler cannot place new Pods.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
