Cloud Computing 16 min read

Soul's Container Cluster Cost Governance: A Case Study on Resource Optimization

Soul's container cluster cost governance case study details their approach to optimizing resource utilization through Kubernetes-based solutions, addressing challenges like resource fragmentation and implementing strategies such as SNAS for elastic scaling and HPA+CronHPA coordination to achieve significant cost reductions.

Soul Technical Team

Sep 29, 2024

Soul's Container Cluster Cost Governance: A Case Study on Resource Optimization

Soul’s container cluster cost governance case study details their approach to optimizing resource utilization through Kubernetes-based solutions, addressing challenges like resource fragmentation and implementing strategies such as SNAS for elastic scaling and HPA+CronHPA coordination to achieve significant cost reductions.

The governance process involved addressing multiple obstacles: HPA node expansion limitations during traffic surges, service resource preemption affecting stability, resource pool wastage during tidal fluctuations, and the complexity of ongoing operations. Solutions included service governance improvements (HPA+CronHPA coordination), resource pool elasticity upgrades (SNAS implementation), and establishing a resource usage observation mechanism.

Key technical implementations comprised:

Service Governance: Optimized HPA+CronHPA coordination to handle traffic surges and ensure resource availability during peak periods.

Resource Pool Elasticity: Deployed SNAS (Soul Node AutoScaler) to dynamically adjust node counts based on resource pool water levels, reducing waste while maintaining service continuity.

Service Binding: Separated CPU and GPU services, optimized resource pool assignments, and implemented resource pool water level control.

Hotspot Rescheduling: Utilized Koord-descheduler for low-node-load-based pod migration during resource contention.

Cost Control: Established resource approval workflows, implemented cost monitoring dashboards, and created service load inspection mechanisms.

Governance outcomes demonstrated improved resource utilization (90%+), reduced overall costs (20%+), and enhanced operational stability through systematic monitoring and optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Computing Kubernetes Resource Management cost optimization container clusters

Written by

Soul Technical Team

Technical practice sharing from Soul

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.