Cloud Native 23 min read

How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

This presentation details Huolala's end‑to‑end cloud‑native cost‑optimization strategy, covering the company's infrastructure basics, Kubernetes‑based server cost‑saving techniques, a tailored optimization roadmap, practical Spot Instance usage, and a custom CronHPA‑driven scheduled scaling solution to boost resource utilization.

Huolala Tech
Huolala Tech
Huolala Tech
How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

1. Huolala Overview

Huolala runs its production environment 100% on public cloud, operating multiple clusters across different cloud providers in Singapore, India, Latin America, and China, requiring vendor‑agnostic solutions. Traffic is highly regular with clear peak and off‑peak periods, enabling simple predictive algorithms for scaling, while large offline data jobs consume about half of the compute resources.

2. Cost‑Optimization Methods on Kubernetes

The main public‑cloud server cost‑saving approaches are:

Annual/Monthly Reserved Instances: Fixed‑term purchases offering stability and low price but limited elasticity.

Savings Plans: Commit to a minimum hourly spend for discounts while retaining scaling flexibility.

Spot Instances: Purchase idle capacity at deep discounts (10‑20% of on‑demand price) with the risk of interruption.

Huolala focuses on public‑cloud server optimization and node‑utilization improvement, excluding private‑cloud and service‑performance tuning.

2.1 Public‑Cloud Server Optimization

Spot Instances are obtained by submitting a maximum acceptable price; the cloud provider allocates instances if inventory and price conditions are met. Interruption signals may arrive at any time, requiring rapid graceful termination handling.

2.2 Node‑Utilization Optimization

Key techniques include:

Reasonable request/limit settings based on application profiles and historical load.

Horizontal Pod Autoscaler (HPA) for reactive scaling, aware of its latency.

CronHPA for scheduled scaling aligned with predictable traffic patterns.

Intelligent scheduling to distribute pods across zones, instance types, and availability zones.

Pod Disruption Budgets (PDB) to guarantee a minimum number of replicas during disruptions.

Low‑priority pause pods to reserve cluster capacity for rapid pre‑emptive eviction.

3. Huolala‑Specific Optimization Roadmap

The roadmap combines Savings Plans for baseline capacity with Spot Instances for elastic workloads, complemented by intelligent request/limit tuning and advanced scheduling to maximize utilization.

4. Spot Instances Deep Dive

Spot Instances are idle resources sold at steep discounts. Procurement involves bidding a maximum price; the provider allocates instances if the market price is below the bid. Instances can be reclaimed instantly when supply dwindles, so Huolala implements:

Expanding the Spot pool across more zones and instance types.

Assigning higher priority to Spot node groups in the Cluster Autoscaler (priority 20 vs. 10 for on‑demand).

Pod affinity rules to spread replicas across zones and instance types, mitigating simultaneous eviction.

Pod Disruption Budgets to maintain service availability during mass evictions.

Low‑priority pause pods to keep spare capacity for rapid pod migration.

Services unsuitable for Spot Instances include single‑replica workloads, long‑startup services, non‑graceful‑shutdown services, and stateful applications.

5. Scheduled Scaling (CronHPA)

Huolala observed that default HPA suffers from scaling latency, threshold constraints, and lack of business‑specific timing control. By combining CronHPA (pre‑scheduled replica adjustments) with HPA (reactive scaling), they achieve:

Predictive pre‑scaling before known traffic spikes (e.g., scaling up at 7‑8 am for a 9 am surge).

Timely down‑scaling after peak periods while preserving a safety buffer.

Automatic adjustment of CronHPA targets based on historical metrics, application profiles, and policy constraints.

6. Future Scaling Plans

Future goals include fully automated request/limit and replica sizing based on application profiling, predictive pre‑scaling, and on‑node vertical scaling without pod restarts, creating a closed‑loop, self‑optimizing autoscaling system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesautoscalingCost OptimizationHPAcronhpaspot instances
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.