Mastering Elastic Scaling for Microservices on Kubernetes: Strategies and Tools
This article explains the concepts of vertical and horizontal elastic scaling, explores how Spring Cloud and Kubernetes native features like HPA and KEDA enable precise scaling, and presents EDAS‑based optimizations for rule triggering, instance scheduling, and serverless Kubernetes deployments.
Elastic Scaling Overview
Elastic scaling dynamically adjusts the number of application instances to match real‑time business demand, improving service quality while minimizing cost. In cloud environments resources are billed on‑demand, so scaling directly reduces expenses compared with static IDC deployments.
Vertical vs. Horizontal Scaling
Vertical scaling (Scale‑Up) changes the specifications of a single server. It is limited by the physical hardware ceiling and often requires complex infrastructure support for dynamic spec changes.
Dynamic spec changes are difficult for many cloud providers.
Physical hardware limits cap maximum capacity.
Horizontal scaling (Scale‑Out) adds or removes server instances. It provides higher capacity, better reliability through multi‑replica deployment, and is the de‑facto method for elastic scaling in production systems.
Microservices and Spring Cloud
Horizontal scaling requires stateless services and reliable inter‑instance communication. Spring Cloud facilitates this by:
Extracting stateless components into independent services and offering centralized configuration management.
Providing service registration, discovery, and circuit‑breaker mechanisms to improve remote‑call reliability.
Native Kubernetes Scaling
Kubernetes (K8s) manages the full lifecycle of containerized applications. For stateless workloads a Deployment is used, and the Horizontal Pod Autoscaler (HPA) adjusts replica counts based on CPU or memory utilization targets.
Reference documentation: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Optimizing Elastic Scaling
Rule Triggering
Default K8s scaling uses only CPU and memory metrics, which may be insufficient for business‑critical workloads. Application‑level metrics such as QPS (queries per second) and latency provide more direct feedback.
KEDA (Kubernetes Event‑Driven Autoscaling) extends K8s with custom scalers that feed these metrics to the HPA.
KEDA project site: https://keda.sh/
EDAS integrates KEDA with ARMS monitoring, exposing golden metrics such as QPS and average response time, and provides a UI for rule configuration.
EDAS also supports time‑based scaling, allowing users to define replica ranges for different periods of the day.
Instance Scheduling
Scaling triggers generate pod‑creation requests that the K8s scheduler must place onto nodes. Effective scheduling strategies include:
Distribute new pods across different nodes to avoid hotspot pressure.
Spread pods across multiple availability zones for higher availability.
Co‑locate tightly coupled pods on the same node to reduce latency.
EDAS provides UI controls to set node or zone affinity directly.
Cluster Autoscaler
The open‑source Cluster Autoscaler adds node‑level auto‑scaling, but it only reacts after a pod scheduling failure, introducing latency and possible service disruption. It also suffers from fragmentation during scale‑in because random pod termination leaves orphaned pods on nodes.
Serverless Kubernetes (ASK)
Alibaba Cloud’s Serverless Kubernetes (ASK) removes the node layer entirely, offering instant pod scheduling and per‑second billing. This aligns perfectly with elastic‑scaling goals because there is no need to provision or manage nodes.
EDAS can manage ASK clusters, allowing users to create serverless applications directly from the EDAS console.
Practical Commands (illustrative)
To enable native HPA on a Deployment:
kubectl autoscale deployment my-app \
--cpu-percent=70 \
--min=2 --max=10To install KEDA via Helm (requires Helm 3):
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/kedaAfter installing KEDA, a ScaledObject can be created to scale on a custom metric (e.g., QPS):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app-scaledobject
spec:
scaleTargetRef:
name: my-app-deployment
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc:9090
metricName: http_requests_total
query: sum(rate(http_requests_total[1m]))
threshold: "100"Conclusion
Effective elastic scaling for microservice applications in a cloud‑native environment hinges on two pillars: rule triggering (using application‑level golden metrics, KEDA, and EDAS‑enhanced UI) and instance scheduling (leveraging K8s scheduler policies, Cluster Autoscaler, or serverless Kubernetes). Combining these techniques yields cost‑effective, reliable scaling that matches real‑time business demand.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
