How Alibaba Cloud K8s Dynamically Scales Nodes: Adding, Expanding, and Auto‑Scaling Explained
This article explains the principles behind Alibaba Cloud Kubernetes cluster node scaling, covering manual and automatic node addition, cluster expansion with ESS, and auto‑scaling using Cluster Autoscaler, along with troubleshooting steps and key components involved.
Alibaba Cloud Kubernetes (K8s) clusters support dynamic node scaling, allowing nodes to be added or removed based on resource demand, which helps optimize costs and performance.
Node Addition Mechanisms
The cluster can increase nodes by:
Manually adding existing ECS instances.
Automatically adding existing nodes.
Cluster expansion (purchasing new nodes) via Elastic Scaling Service (ESS).
Automatic scaling using Cluster Autoscaler.
1. Manual Addition of Existing Nodes
Node preparation involves converting a regular ECS instance into a K8s node with a single command that downloads attach_node.sh and runs it with an OpenAPI token:
curl http:///public/pkg/run/attach//attach_node.sh | bash -s -- --openapi-tokenThe token provides the script with cluster information needed for configuration. The process consists of reading (collecting data) and writing (configuring the node), including a kubeadm join step that establishes trust between the new node and the master using a bootstrap token.
2. Automatic Addition of Existing Nodes
Instead of manually executing the script, the control plane injects the script into the ECS instance’s userdata. Upon reboot, the script runs automatically, using parameters that already contain the necessary cluster information, thus skipping the token‑fetch step.
!/bin/bash
mkdir -p /var/log/acs
curl http://public/pkg/run/attach/1.12.6-aliyun.1/attach_node.sh | bash -s -- --docker-version --token --endpoint --cluster-dns > /var/log/acs/init.log3. Cluster Expansion (Purchasing New Nodes)
When new nodes need to be provisioned, ESS creates ECS instances from scratch. After creation, the same attach_node.sh script (via userdata) prepares the nodes, similar to the manual/automatic addition paths.
4. Automatic Scaling
Automatic scaling combines ESS with the Cluster Autoscaler pod. Two processes are involved:
Configuring node specifications and userdata, marking nodes for autoscaling.
The Autoscaler watches pod scheduling failures; when pods cannot be scheduled due to insufficient "reservation rate" (not usage rate), it triggers ESS to provision new nodes.
Scaling down occurs when the reservation rate drops, and the Autoscaler removes nodes automatically.
Node Removal Process
Removing nodes differs based on how they were added:
For manually or automatically added existing nodes: clear userdata via ECS API, delete the node via K8s API, and run kubeadm reset on the ECS instance.
For nodes added through cluster expansion: the above steps plus detaching the ESS‑ECS relationship via ESS API.
For autoscaler‑added nodes: the Autoscaler automatically removes them when the reservation rate is low.
Troubleshooting Checklist
Cluster Autoscaler logs can be accessed like any other pod.
ESS configuration and logs are viewable in the ESS console.
Control plane logs are available through the platform’s log viewer.
Node preparation and cleanup scripts should be inspected for errors.
Understanding these components—Cluster Autoscaler, ESS, the control plane, and node scripts—helps operators diagnose scaling issues effectively.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
