Optimizing Multi‑Cluster Cloud Native Costs: ZEEK’s ACK FinOps Journey
This article details how ZEEK automotive tackled rapid growth challenges by redesigning its cloud‑native infrastructure, adopting Alibaba Cloud ACK FinOps and ACK One for multi‑cluster management, and implementing cost‑visibility, intelligent allocation, and configuration checks that yielded significant resource savings and operational stability.
Preface
In 2021 ZEEK’s model 001 set a record by delivering over 10,000 units in just 110 days, and subsequent models continued to break delivery speed milestones, reaching more than 10,000 sales for the 001 model worldwide. The rapid growth put enormous pressure on the digital infrastructure team, which had to support order fulfillment, vehicle delivery, and payment settlement across many core systems.
Management Challenges
The team faced three main problems in a cloud‑native environment:
High complexity of resource management: Managing dozens of Kubernetes clusters across public and private clouds increased operational overhead, made cost allocation and permission control difficult, and caused inconsistent cluster versions.
Insufficiently intelligent resource allocation: Diverse workloads (B‑end management, C‑end high‑concurrency services) made it hard for operators to predict traffic and set appropriate CPU/memory requests, leading to over‑provisioning.
Ensuring long‑term sustainability: Short‑term optimization efforts risked becoming ineffective as workloads changed, potentially eroding confidence in the cost‑control investment.
Business Goals
The platform team defined four concrete objectives:
Cost insight and analysis: Build a fine‑grained cost‑sharing model and provide intelligent pod‑level utilization analysis.
Configuration baseline checks: Verify that deployment scripts meet compliance for monitoring and self‑healing.
Cluster consolidation: Merge low‑utilization clusters to reduce management complexity and cost.
Stateless infrastructure: Use Kubernetes as a standard base so that new clouds require only minimal parameter changes.
Solution Selection
After evaluating multi‑cloud management platforms (CMP) and finding them too heavyweight, ZEEK focused on a cloud‑native‑first approach. The primary tool chosen was Alibaba Cloud ACK FinOps, which offers cost analysis at the cluster, namespace, node‑pool, and application levels.
Two cost‑allocation models were considered:
Single‑dimension model (CPU or memory): Simple calculation where pod cost = (pod request / total node resources) × node price.
Weighted mixed‑dimension model: Combines CPU and memory weights to avoid unfair cost distribution when workloads have disparate resource profiles.
Given ZEEK’s Java‑heavy workloads, memory became the bottleneck, so the single‑dimension memory model was adopted.
Cost allocation granularity will later be refined to the namespace level after cluster consolidation.
Resource Optimization Strategies
Two main optimization directions were identified:
CPU optimization: Adjust pod QoS and request values cautiously; over‑selling CPU can cause eviction under high load.
Memory optimization: Java applications reserve heap memory for the JVM, leading to potential OOM if requests are too low. Engineers typically set requests higher than the heap to avoid OOM, but this can waste memory.
ACK’s built‑in ack‑koordinator provides a free cost‑portrait capability that continuously records container usage, applies a decay algorithm, and suggests upward or downward adjustments based on a “Recommend” vs. “Request” comparison.
Example command to set a custom price label on a node:
kubectl label nodes node.kubernetes.io/price-per-day="100"Configuration Checks and Security
Beyond cost, ACK One offers configuration inspection based on Alibaba Cloud container security best practices. Checks include:
Privileged parameters, high‑risk capabilities, root user, insecure Ingress, anonymous RBAC bindings.
Missing CPU/memory limits.
Absent liveness/readiness probes or single‑replica deployments.
Achievements
Efficient resource utilization: Analyzing thousands of pods reduced overall resource consumption by ~25%, saving millions of RMB annually.
System stability and business continuity: Implemented backup strategies on ACK One, improving data safety and uptime.
Centralized multi‑cloud management: Unified control of ~30 K8s clusters across public, private, and edge environments lowered operational complexity.
Agile business expansion: Faster resource scaling enabled rapid response to market changes.
Optimized release workflow: Integrated cost‑portrait recommendations into the CI/CD pipeline, reducing failure rates.
Team skill growth: Cross‑functional collaboration on ACK One increased Kubernetes expertise.
Future Outlook
Cloud computing is now the backbone of the digital economy, and cloud‑native technologies continue to reshape how enterprises adopt and use the cloud. ZEEK will keep refining its FinOps practices—inform, optimize, and operate—to maintain sustainable infrastructure, lower costs, and unlock further technical dividends.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
