Cloud Native 25 min read

How Huolala Cuts Cloud Costs: Real‑World Kubernetes Optimization Strategies

Huolala’s architecture team shares a detailed walkthrough of their cloud‑native cost‑optimization journey, covering public‑cloud server pricing models, Kubernetes request/limit tuning, HPA and CronHPA scheduling, spot instance integration, intelligent pod placement, and practical lessons learned from scaling a global on‑demand logistics platform.

dbaplus Community
dbaplus Community
dbaplus Community
How Huolala Cuts Cloud Costs: Real‑World Kubernetes Optimization Strategies

Background

Huolala (货拉拉) is a global on‑demand logistics platform whose production environment runs 100% on public cloud across multiple regions (Singapore, India, Latin America, China). Its traffic is highly regular with clear peak and off‑peak periods, and roughly half of its compute resources are consumed by large‑scale offline big‑data jobs. These characteristics drive the need for cloud‑native cost‑optimization techniques that are vendor‑agnostic.

Five‑Part Outline of the Talk

Company overview

Kubernetes‑based cost‑optimization methods

Huolala‑specific optimization roadmap

Spot (bid) instance practice

Scheduled scaling practice

1. Company Overview

All production workloads run on public cloud, requiring cloud‑agnostic solutions.

Multi‑cluster deployment across different cloud providers.

Predictable traffic patterns enable simple forecasting algorithms.

Offline big‑data tasks occupy ~50% of compute, offering opportunities for mixed online/offline scheduling.

2. Kubernetes‑Based Cost‑Optimization Methods

The speaker groups cost‑saving techniques into four categories but focuses on two that are relevant to Huolala’s 100% public‑cloud setup:

Public‑cloud server cost optimization

Server utilization optimization

2.1 Public‑Cloud Server Cost Optimization

Public‑cloud providers offer three main discount models:

Reserved (monthly/annual) instances : Fixed‑term purchase, stable price, limited elasticity.

Savings plans : Commit to a minimum hourly spend in exchange for a discount; allows flexible scaling while guaranteeing a baseline spend.

Spot (bid) instances : Purchase unused capacity at 10‑20% of on‑demand price; price is market‑driven and instances can be reclaimed at any time.

2.2 Server Utilization Optimization

Request/Limit tuning : Set initial request/limit based on application profiling, then periodically adjust via a monitoring loop.

Horizontal Pod Autoscaler (HPA) : Auto‑scale based on metrics (e.g., CPU > 35%).

CronHPA : Time‑based scaling for predictable peaks and troughs.

Intelligent scheduling : Customize the default scheduler with weight calculations, GPU awareness, and stacking strategies to improve node utilization.

Offline‑online mixed deployment : Run offline batch jobs on idle capacity during off‑peak hours, co‑locating them with online services in the same cluster to increase overall utilization.

3. Huolala‑Specific Cost‑Optimization Roadmap

The roadmap combines industry best practices with Huolala’s operational realities:

Use a Savings Plan to secure baseline compute at a discounted rate.

Introduce Spot instances for elastic capacity, handling their interruption risk with robust automation.

Deploy CronHPA for predictable scaling and HPA for burst handling.

Implement intelligent request/limit calculation based on historical metrics.

Adopt smart pod‑affinity rules to spread pods across zones, instance types, and availability zones.

Configure Pod Disruption Budgets (PDB) to guarantee a minimum percentage of replicas during forced evictions.

Reserve space with low‑priority pause pods so that when a Spot instance is reclaimed, high‑priority pods can pre‑empt the pause pods and continue running without waiting for a new node.

4. Spot (Bid) Instances

Spot instances are unused cloud capacity sold at a steep discount. Providers expose the capacity via either a fixed discount (e.g., 80% off) or a bidding mechanism where the user submits a maximum acceptable price. The cloud provider matches bids against inventory and allocates the instance if the bid meets or exceeds the market price.

Key characteristics:

Cost: 10‑20% of on‑demand price (sometimes 30‑60% compared to reserved instances).

Availability: Dependent on spare capacity; no guarantee of supply.

Interruption: Instances can be reclaimed at any moment; providers may give a short warning (e.g., AWS 2‑minute notice, Alibaba 1‑hour protection).

Integration with Kubernetes:

Node groups are created for both on‑demand and Spot instances; the Cluster Autoscaler scales each group independently.

Node‑group priority is set via the autoscaler config (e.g., Spot priority 20, on‑demand 10) so Spot nodes are drained first.

Pod anti‑affinity spreads replicas across zones, instance types, and nodes to avoid simultaneous loss.

PDB ensures at least a configurable percentage of replicas remain available during mass evictions.

Low‑priority pause pods reserve cluster capacity, allowing high‑priority pods to pre‑empt them instantly when a Spot node disappears.

Sample YAML snippet for a priority class (shown as code):

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority-pause
value: -1
preemptionPolicy: PreemptLowerPriority

5. Scheduled Scaling (CronHPA)

Baseline CPU utilization of an unoptimized Huolala cluster shows ~35% during daytime peaks and ~2.5% at night. The target is ≥50% during peaks and ≥30% during off‑peak.

Limitations of vanilla HPA:

Scaling latency – HPA reacts only after metrics cross thresholds.

Threshold trade‑off – a low CPU target limits peak utilization; a high target delays scaling.

No ability to schedule scaling based on known business rhythms.

Solution: a custom CronHPA + HPA combo.

CronHPA adjusts the *minimum* replica count of an HPA at predefined times (e.g., pre‑scale at 07:00 for the 09:00 traffic surge).

HPA still handles unexpected spikes.

Architecture:

Custom controller hll‑cronhpa‑controller watches a CRD CronHPA.

The controller fetches historical metrics from Prometheus, merges them with application profiles and scaling policies from a config center, and updates the associated HPA’s minReplicas.

Implementation stages:

Manual time‑based scaling with static ratios.

Metric‑driven automatic timing based on past usage.

Algorithmic calculation of optimal replica counts per service.

Automatic detection of each service’s low‑peak window.

Phased scaling that gradually ramps up/down rather than instant jumps.

Future roadmap includes automatic request/limit inference, predictive pre‑scaling, and in‑place vertical scaling (VPA‑like) without pod restarts.

Conclusion

By combining Savings Plans, Spot instances, intelligent request/limit tuning, CronHPA‑driven scheduled scaling, and a suite of Kubernetes‑level safeguards (priority classes, anti‑affinity, PDB, pause pods), Huolala dramatically improved server utilization and reduced cloud spend while maintaining the reliability required for a global logistics service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesautoscalingCost Optimizationspot instancesHuolala
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.