How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling
This presentation details Huolala's end‑to‑end cloud‑native cost‑optimization strategy, covering the company's infrastructure basics, Kubernetes‑based server cost‑saving techniques, a tailored optimization roadmap, practical Spot Instance usage, and a custom CronHPA‑driven scheduled scaling solution to boost resource utilization.
1. Huolala Overview
Huolala runs its production environment 100% on public cloud, operating multiple clusters across different cloud providers in Singapore, India, Latin America, and China, requiring vendor‑agnostic solutions. Traffic is highly regular with clear peak and off‑peak periods, enabling simple predictive algorithms for scaling, while large offline data jobs consume about half of the compute resources.
2. Cost‑Optimization Methods on Kubernetes
The main public‑cloud server cost‑saving approaches are:
Annual/Monthly Reserved Instances: Fixed‑term purchases offering stability and low price but limited elasticity.
Savings Plans: Commit to a minimum hourly spend for discounts while retaining scaling flexibility.
Spot Instances: Purchase idle capacity at deep discounts (10‑20% of on‑demand price) with the risk of interruption.
Huolala focuses on public‑cloud server optimization and node‑utilization improvement, excluding private‑cloud and service‑performance tuning.
2.1 Public‑Cloud Server Optimization
Spot Instances are obtained by submitting a maximum acceptable price; the cloud provider allocates instances if inventory and price conditions are met. Interruption signals may arrive at any time, requiring rapid graceful termination handling.
2.2 Node‑Utilization Optimization
Key techniques include:
Reasonable request/limit settings based on application profiles and historical load.
Horizontal Pod Autoscaler (HPA) for reactive scaling, aware of its latency.
CronHPA for scheduled scaling aligned with predictable traffic patterns.
Intelligent scheduling to distribute pods across zones, instance types, and availability zones.
Pod Disruption Budgets (PDB) to guarantee a minimum number of replicas during disruptions.
Low‑priority pause pods to reserve cluster capacity for rapid pre‑emptive eviction.
3. Huolala‑Specific Optimization Roadmap
The roadmap combines Savings Plans for baseline capacity with Spot Instances for elastic workloads, complemented by intelligent request/limit tuning and advanced scheduling to maximize utilization.
4. Spot Instances Deep Dive
Spot Instances are idle resources sold at steep discounts. Procurement involves bidding a maximum price; the provider allocates instances if the market price is below the bid. Instances can be reclaimed instantly when supply dwindles, so Huolala implements:
Expanding the Spot pool across more zones and instance types.
Assigning higher priority to Spot node groups in the Cluster Autoscaler (priority 20 vs. 10 for on‑demand).
Pod affinity rules to spread replicas across zones and instance types, mitigating simultaneous eviction.
Pod Disruption Budgets to maintain service availability during mass evictions.
Low‑priority pause pods to keep spare capacity for rapid pod migration.
Services unsuitable for Spot Instances include single‑replica workloads, long‑startup services, non‑graceful‑shutdown services, and stateful applications.
5. Scheduled Scaling (CronHPA)
Huolala observed that default HPA suffers from scaling latency, threshold constraints, and lack of business‑specific timing control. By combining CronHPA (pre‑scheduled replica adjustments) with HPA (reactive scaling), they achieve:
Predictive pre‑scaling before known traffic spikes (e.g., scaling up at 7‑8 am for a 9 am surge).
Timely down‑scaling after peak periods while preserving a safety buffer.
Automatic adjustment of CronHPA targets based on historical metrics, application profiles, and policy constraints.
6. Future Scaling Plans
Future goals include fully automated request/limit and replica sizing based on application profiling, predictive pre‑scaling, and on‑node vertical scaling without pod restarts, creating a closed‑loop, self‑optimizing autoscaling system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
