Boosting Cluster Resource Utilization with Alibaba Cloud Native Elastic Solutions
This article explains how Alibaba Cloud's native elastic solutions—covering application‑level scaling, resource‑level scaling, and the new instant elastic controller—help enterprises improve Kubernetes cluster resource utilization, reduce costs, and simplify operations through advanced metrics, custom scaling policies, and event‑driven node management.
Why Elastic Solutions Matter
As cloud adoption spreads beyond traditional internet companies to manufacturing and industrial enterprises, improving cluster resource utilization has become a common goal. The gap between planned capacity and actual demand can be bridged by elastic scaling, which optimizes costs while maintaining performance.
Application‑Layer Elasticity
Application‑level scaling focuses on Pods. Horizontal scaling (HPA) can be triggered by three sources: metrics (CPU, memory, custom metrics), time schedules (CronHPA), and events (e.g., message queue length via KEDA). Alibaba Cloud ACK extends HPA with a Metrics Adapter that supports custom metrics such as GPU usage or Prometheus queries, as well as built‑in metrics like Ingress/QPS. The adapter enables scaling of Deployments, StatefulSets, and custom resources (e.g., Spark or Presto CRDs) by exposing a generic scale subresource.
Resource‑Layer Elasticity
Resource‑level scaling ensures the cluster has enough nodes to schedule Pods and releases idle nodes to avoid waste. It is divided into two main resource types: ECS nodes and Elastic Container Instances (ECI). Five key dimensions are considered:
Cost – ECI does not support over‑commit, while ECS can achieve 1:2–1:4 over‑commit ratios.
Efficiency – cluster‑autoscaler operates on a minute‑scale loop; ECI can provision within a minute.
Scale – ECS with cluster‑autoscaler scales larger clusters; ECI follows a One‑Pod‑One‑Node model.
Compatibility – cluster‑autoscaler is fully compatible; ECI has limitations for kernel parameters or DaemonSets.
Operational complexity – cluster‑autoscaler requires more ops; ECI is designed for zero‑ops.
Challenges of Traditional Cluster‑Autoscaler
Cluster‑autoscaler uses a polling loop that abstracts each node pool as a virtual node, leading to delivery uncertainty, slower response, and complex troubleshooting, especially as cluster size and workload diversity grow.
Instant Elasticity – The Next‑Gen Solution
Instant Elasticity is an event‑driven node‑scaling controller that retains compatibility with existing node‑pool semantics while offering four improvements:
More accurate: Replaces the One‑Nodepool‑One‑Virtual‑Node model with a scaling plan that selects specific instance types, improving placement precision.
Faster: Event‑driven parallel scaling reduces latency compared to the 15‑second polling interval of cluster‑autoscaler.
Lightweight: Fewer node pools are needed because a single pool can host multiple instance specs.
More user‑controlled (YOU): Users can inject custom logic into the node lifecycle via policies for both scaling up and down.
Performance Comparison
In a typical expansion scenario, cluster‑autoscaler adds a full‑size node even when a small request could be satisfied, wasting resources. Instant Elasticity selects the smallest suitable instance, improving utilization and reducing operational overhead.
When handling bursty workloads (e.g., three batches of Pods arriving 10 seconds apart), cluster‑autoscaler’s batch processing caused a total scheduling time of ~90 seconds, whereas Instant Elasticity processed each batch immediately, keeping total time around 45 seconds.
Operationally, Instant Elasticity provides clearer pod‑event diagnostics, pre‑drain hooks, and integrated dashboards, simplifying troubleshooting and reducing the exponential complexity that arises with larger clusters.
Extended Capabilities
Instant Elasticity allows users to specify detailed scaling preferences such as availability zones, instance‑type priority, and spot‑instance usage. For scale‑down, users can define custom policies to ensure graceful termination, data collection, and log aggregation before nodes are removed.
Real‑World Case Study
A gaming AI workload with >100 million monthly active users required both rapid scaling and guaranteed pod redundancy. The solution combined:
Aliyun Prometheus to collect player counts.
Metrics Adapter to expose player count as a custom HPA metric, with built‑in redundancy.
Instant Elasticity for event‑driven node scaling, supporting mixed‑instance pools and spot‑instance cost savings (up to 90% compared to on‑demand).
Custom drain‑time settings and DaemonSet‑aware scale‑down to ensure player sessions and logs are preserved before node termination.
The result was a >50% improvement in scaling efficiency, lower operational burden, and cost reductions while maintaining high availability for the game’s AI services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
