Cloud Native 18 min read

Optimizing Multi‑Cluster Cloud Native Costs: ZEEK’s ACK FinOps Journey

This article details how ZEEK automotive tackled rapid growth challenges by redesigning its cloud‑native infrastructure, adopting Alibaba Cloud ACK FinOps and ACK One for multi‑cluster management, and implementing cost‑visibility, intelligent allocation, and configuration checks that yielded significant resource savings and operational stability.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Optimizing Multi‑Cluster Cloud Native Costs: ZEEK’s ACK FinOps Journey

Preface

In 2021 ZEEK’s model 001 set a record by delivering over 10,000 units in just 110 days, and subsequent models continued to break delivery speed milestones, reaching more than 10,000 sales for the 001 model worldwide. The rapid growth put enormous pressure on the digital infrastructure team, which had to support order fulfillment, vehicle delivery, and payment settlement across many core systems.

Management Challenges

The team faced three main problems in a cloud‑native environment:

High complexity of resource management: Managing dozens of Kubernetes clusters across public and private clouds increased operational overhead, made cost allocation and permission control difficult, and caused inconsistent cluster versions.

Insufficiently intelligent resource allocation: Diverse workloads (B‑end management, C‑end high‑concurrency services) made it hard for operators to predict traffic and set appropriate CPU/memory requests, leading to over‑provisioning.

Ensuring long‑term sustainability: Short‑term optimization efforts risked becoming ineffective as workloads changed, potentially eroding confidence in the cost‑control investment.

Business Goals

The platform team defined four concrete objectives:

Cost insight and analysis: Build a fine‑grained cost‑sharing model and provide intelligent pod‑level utilization analysis.

Configuration baseline checks: Verify that deployment scripts meet compliance for monitoring and self‑healing.

Cluster consolidation: Merge low‑utilization clusters to reduce management complexity and cost.

Stateless infrastructure: Use Kubernetes as a standard base so that new clouds require only minimal parameter changes.

Solution Selection

After evaluating multi‑cloud management platforms (CMP) and finding them too heavyweight, ZEEK focused on a cloud‑native‑first approach. The primary tool chosen was Alibaba Cloud ACK FinOps, which offers cost analysis at the cluster, namespace, node‑pool, and application levels.

Two cost‑allocation models were considered:

Single‑dimension model (CPU or memory): Simple calculation where pod cost = (pod request / total node resources) × node price.

Weighted mixed‑dimension model: Combines CPU and memory weights to avoid unfair cost distribution when workloads have disparate resource profiles.

Given ZEEK’s Java‑heavy workloads, memory became the bottleneck, so the single‑dimension memory model was adopted.

Cost allocation granularity will later be refined to the namespace level after cluster consolidation.

ACK cost insight
ACK cost insight

Resource Optimization Strategies

Two main optimization directions were identified:

CPU optimization: Adjust pod QoS and request values cautiously; over‑selling CPU can cause eviction under high load.

Memory optimization: Java applications reserve heap memory for the JVM, leading to potential OOM if requests are too low. Engineers typically set requests higher than the heap to avoid OOM, but this can waste memory.

ACK’s built‑in ack‑koordinator provides a free cost‑portrait capability that continuously records container usage, applies a decay algorithm, and suggests upward or downward adjustments based on a “Recommend” vs. “Request” comparison.

Example command to set a custom price label on a node:

kubectl label nodes node.kubernetes.io/price-per-day="100"

Configuration Checks and Security

Beyond cost, ACK One offers configuration inspection based on Alibaba Cloud container security best practices. Checks include:

Privileged parameters, high‑risk capabilities, root user, insecure Ingress, anonymous RBAC bindings.

Missing CPU/memory limits.

Absent liveness/readiness probes or single‑replica deployments.

Configuration check
Configuration check

Achievements

Efficient resource utilization: Analyzing thousands of pods reduced overall resource consumption by ~25%, saving millions of RMB annually.

System stability and business continuity: Implemented backup strategies on ACK One, improving data safety and uptime.

Centralized multi‑cloud management: Unified control of ~30 K8s clusters across public, private, and edge environments lowered operational complexity.

Agile business expansion: Faster resource scaling enabled rapid response to market changes.

Optimized release workflow: Integrated cost‑portrait recommendations into the CI/CD pipeline, reducing failure rates.

Team skill growth: Cross‑functional collaboration on ACK One increased Kubernetes expertise.

Future Outlook

Cloud computing is now the backbone of the digital economy, and cloud‑native technologies continue to reshape how enterprises adopt and use the cloud. ZEEK will keep refining its FinOps practices—inform, optimize, and operate—to maintain sustainable infrastructure, lower costs, and unlock further technical dividends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesResource ManagementMulti-ClusterCost OptimizationFinOps
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.