How Huolala Cuts Cloud Costs: Real‑World Kubernetes Optimization Strategies
Huolala’s architecture team shares a detailed walkthrough of their cloud‑native cost‑optimization journey, covering public‑cloud server pricing models, Kubernetes request/limit tuning, HPA and CronHPA scheduling, spot instance integration, intelligent pod placement, and practical lessons learned from scaling a global on‑demand logistics platform.
Background
Huolala (货拉拉) is a global on‑demand logistics platform whose production environment runs 100% on public cloud across multiple regions (Singapore, India, Latin America, China). Its traffic is highly regular with clear peak and off‑peak periods, and roughly half of its compute resources are consumed by large‑scale offline big‑data jobs. These characteristics drive the need for cloud‑native cost‑optimization techniques that are vendor‑agnostic.
Five‑Part Outline of the Talk
Company overview
Kubernetes‑based cost‑optimization methods
Huolala‑specific optimization roadmap
Spot (bid) instance practice
Scheduled scaling practice
1. Company Overview
All production workloads run on public cloud, requiring cloud‑agnostic solutions.
Multi‑cluster deployment across different cloud providers.
Predictable traffic patterns enable simple forecasting algorithms.
Offline big‑data tasks occupy ~50% of compute, offering opportunities for mixed online/offline scheduling.
2. Kubernetes‑Based Cost‑Optimization Methods
The speaker groups cost‑saving techniques into four categories but focuses on two that are relevant to Huolala’s 100% public‑cloud setup:
Public‑cloud server cost optimization
Server utilization optimization
2.1 Public‑Cloud Server Cost Optimization
Public‑cloud providers offer three main discount models:
Reserved (monthly/annual) instances : Fixed‑term purchase, stable price, limited elasticity.
Savings plans : Commit to a minimum hourly spend in exchange for a discount; allows flexible scaling while guaranteeing a baseline spend.
Spot (bid) instances : Purchase unused capacity at 10‑20% of on‑demand price; price is market‑driven and instances can be reclaimed at any time.
2.2 Server Utilization Optimization
Request/Limit tuning : Set initial request/limit based on application profiling, then periodically adjust via a monitoring loop.
Horizontal Pod Autoscaler (HPA) : Auto‑scale based on metrics (e.g., CPU > 35%).
CronHPA : Time‑based scaling for predictable peaks and troughs.
Intelligent scheduling : Customize the default scheduler with weight calculations, GPU awareness, and stacking strategies to improve node utilization.
Offline‑online mixed deployment : Run offline batch jobs on idle capacity during off‑peak hours, co‑locating them with online services in the same cluster to increase overall utilization.
3. Huolala‑Specific Cost‑Optimization Roadmap
The roadmap combines industry best practices with Huolala’s operational realities:
Use a Savings Plan to secure baseline compute at a discounted rate.
Introduce Spot instances for elastic capacity, handling their interruption risk with robust automation.
Deploy CronHPA for predictable scaling and HPA for burst handling.
Implement intelligent request/limit calculation based on historical metrics.
Adopt smart pod‑affinity rules to spread pods across zones, instance types, and availability zones.
Configure Pod Disruption Budgets (PDB) to guarantee a minimum percentage of replicas during forced evictions.
Reserve space with low‑priority pause pods so that when a Spot instance is reclaimed, high‑priority pods can pre‑empt the pause pods and continue running without waiting for a new node.
4. Spot (Bid) Instances
Spot instances are unused cloud capacity sold at a steep discount. Providers expose the capacity via either a fixed discount (e.g., 80% off) or a bidding mechanism where the user submits a maximum acceptable price. The cloud provider matches bids against inventory and allocates the instance if the bid meets or exceeds the market price.
Key characteristics:
Cost: 10‑20% of on‑demand price (sometimes 30‑60% compared to reserved instances).
Availability: Dependent on spare capacity; no guarantee of supply.
Interruption: Instances can be reclaimed at any moment; providers may give a short warning (e.g., AWS 2‑minute notice, Alibaba 1‑hour protection).
Integration with Kubernetes:
Node groups are created for both on‑demand and Spot instances; the Cluster Autoscaler scales each group independently.
Node‑group priority is set via the autoscaler config (e.g., Spot priority 20, on‑demand 10) so Spot nodes are drained first.
Pod anti‑affinity spreads replicas across zones, instance types, and nodes to avoid simultaneous loss.
PDB ensures at least a configurable percentage of replicas remain available during mass evictions.
Low‑priority pause pods reserve cluster capacity, allowing high‑priority pods to pre‑empt them instantly when a Spot node disappears.
Sample YAML snippet for a priority class (shown as code):
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority-pause
value: -1
preemptionPolicy: PreemptLowerPriority5. Scheduled Scaling (CronHPA)
Baseline CPU utilization of an unoptimized Huolala cluster shows ~35% during daytime peaks and ~2.5% at night. The target is ≥50% during peaks and ≥30% during off‑peak.
Limitations of vanilla HPA:
Scaling latency – HPA reacts only after metrics cross thresholds.
Threshold trade‑off – a low CPU target limits peak utilization; a high target delays scaling.
No ability to schedule scaling based on known business rhythms.
Solution: a custom CronHPA + HPA combo.
CronHPA adjusts the *minimum* replica count of an HPA at predefined times (e.g., pre‑scale at 07:00 for the 09:00 traffic surge).
HPA still handles unexpected spikes.
Architecture:
Custom controller hll‑cronhpa‑controller watches a CRD CronHPA.
The controller fetches historical metrics from Prometheus, merges them with application profiles and scaling policies from a config center, and updates the associated HPA’s minReplicas.
Implementation stages:
Manual time‑based scaling with static ratios.
Metric‑driven automatic timing based on past usage.
Algorithmic calculation of optimal replica counts per service.
Automatic detection of each service’s low‑peak window.
Phased scaling that gradually ramps up/down rather than instant jumps.
Future roadmap includes automatic request/limit inference, predictive pre‑scaling, and in‑place vertical scaling (VPA‑like) without pod restarts.
Conclusion
By combining Savings Plans, Spot instances, intelligent request/limit tuning, CronHPA‑driven scheduled scaling, and a suite of Kubernetes‑level safeguards (priority classes, anti‑affinity, PDB, pause pods), Huolala dramatically improved server utilization and reduced cloud spend while maintaining the reliability required for a global logistics service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
