Cloud Native 18 min read

Boosting Cluster Resource Utilization with Alibaba Cloud Native Elastic Solutions

This article explains how Alibaba Cloud's native elastic solutions—covering application‑level scaling, resource‑level scaling, and the new instant elastic controller—help enterprises improve Kubernetes cluster resource utilization, reduce costs, and simplify operations through advanced metrics, custom scaling policies, and event‑driven node management.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Boosting Cluster Resource Utilization with Alibaba Cloud Native Elastic Solutions

Why Elastic Solutions Matter

As cloud adoption spreads beyond traditional internet companies to manufacturing and industrial enterprises, improving cluster resource utilization has become a common goal. The gap between planned capacity and actual demand can be bridged by elastic scaling, which optimizes costs while maintaining performance.

Application‑Layer Elasticity

Application‑level scaling focuses on Pods. Horizontal scaling (HPA) can be triggered by three sources: metrics (CPU, memory, custom metrics), time schedules (CronHPA), and events (e.g., message queue length via KEDA). Alibaba Cloud ACK extends HPA with a Metrics Adapter that supports custom metrics such as GPU usage or Prometheus queries, as well as built‑in metrics like Ingress/QPS. The adapter enables scaling of Deployments, StatefulSets, and custom resources (e.g., Spark or Presto CRDs) by exposing a generic scale subresource.

Resource‑Layer Elasticity

Resource‑level scaling ensures the cluster has enough nodes to schedule Pods and releases idle nodes to avoid waste. It is divided into two main resource types: ECS nodes and Elastic Container Instances (ECI). Five key dimensions are considered:

Cost – ECI does not support over‑commit, while ECS can achieve 1:2–1:4 over‑commit ratios.

Efficiency – cluster‑autoscaler operates on a minute‑scale loop; ECI can provision within a minute.

Scale – ECS with cluster‑autoscaler scales larger clusters; ECI follows a One‑Pod‑One‑Node model.

Compatibility – cluster‑autoscaler is fully compatible; ECI has limitations for kernel parameters or DaemonSets.

Operational complexity – cluster‑autoscaler requires more ops; ECI is designed for zero‑ops.

Challenges of Traditional Cluster‑Autoscaler

Cluster‑autoscaler uses a polling loop that abstracts each node pool as a virtual node, leading to delivery uncertainty, slower response, and complex troubleshooting, especially as cluster size and workload diversity grow.

Instant Elasticity – The Next‑Gen Solution

Instant Elasticity is an event‑driven node‑scaling controller that retains compatibility with existing node‑pool semantics while offering four improvements:

More accurate: Replaces the One‑Nodepool‑One‑Virtual‑Node model with a scaling plan that selects specific instance types, improving placement precision.

Faster: Event‑driven parallel scaling reduces latency compared to the 15‑second polling interval of cluster‑autoscaler.

Lightweight: Fewer node pools are needed because a single pool can host multiple instance specs.

More user‑controlled (YOU): Users can inject custom logic into the node lifecycle via policies for both scaling up and down.

Performance Comparison

In a typical expansion scenario, cluster‑autoscaler adds a full‑size node even when a small request could be satisfied, wasting resources. Instant Elasticity selects the smallest suitable instance, improving utilization and reducing operational overhead.

When handling bursty workloads (e.g., three batches of Pods arriving 10 seconds apart), cluster‑autoscaler’s batch processing caused a total scheduling time of ~90 seconds, whereas Instant Elasticity processed each batch immediately, keeping total time around 45 seconds.

Operationally, Instant Elasticity provides clearer pod‑event diagnostics, pre‑drain hooks, and integrated dashboards, simplifying troubleshooting and reducing the exponential complexity that arises with larger clusters.

Extended Capabilities

Instant Elasticity allows users to specify detailed scaling preferences such as availability zones, instance‑type priority, and spot‑instance usage. For scale‑down, users can define custom policies to ensure graceful termination, data collection, and log aggregation before nodes are removed.

Real‑World Case Study

A gaming AI workload with >100 million monthly active users required both rapid scaling and guaranteed pod redundancy. The solution combined:

Aliyun Prometheus to collect player counts.

Metrics Adapter to expose player count as a custom HPA metric, with built‑in redundancy.

Instant Elasticity for event‑driven node scaling, supporting mixed‑instance pools and spot‑instance cost savings (up to 90% compared to on‑demand).

Custom drain‑time settings and DaemonSet‑aware scale‑down to ensure player sessions and logs are preserved before node termination.

The result was a >50% improvement in scaling efficiency, lower operational burden, and cost reductions while maintaining high availability for the game’s AI services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesresource utilizationCluster AutoscalerACKinstant elasticity
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.