KubeCost: Kubernetes-Based Resource Cost Analysis and Allocation System
KubeCost, developed by NetEase Cloud Music, is a low‑intrusion, scalable Kubernetes cost analysis system that allocates resource expenses using peak‑or‑usage billing models, supports hybrid‑multi‑cloud pricing, aggregates per‑pod CPU/memory/GPU costs, and stores data efficiently in ClickHouse for reliable, business‑oriented financial insight.
This article introduces KubeCost, a Kubernetes-based resource cost analysis tool developed by NetEase Cloud Music to address IT cost management challenges in the cloud-native era.
Background and Challenges:
Many internet companies have entered a stable development phase where cost control has become critical. IT costs typically account for 1/3 of total operational costs (technology to human resource ratio is approximately 1:2 to 1:2.5). With the adoption of Kubernetes, containers, and DevOps practices, resource management has become more complex. NetEase Cloud Music achieved 50%+ peak resource utilization through containerization, oversubscription, unified scheduling, and hybrid cloud deployment, saving tens of millions annually. However, challenges remain: resource growth continues rapidly with easy DevOps access, and the "big ledger problem" makes it difficult to allocate costs to business lines and evaluate ROI.
Key Challenges Identified:
Decentralization: Traditional centralized financial budgeting is shifting to business-oriented distributed decision-making
Dynamic Changes: Cloud environments and elastic capabilities cause costs to vary with business load
Excess Waste: Easy access to resources often leads to over-provisioning
KubeCost Features:
Multiple Billing Models: Supports annual reserved and pay-as-you-go pricing. For reserved resources, costs are allocated based on peak usage; for spot/low-utilization periods, actual usage-based allocation is applied.
Hybrid/Multi-Cloud Support: Handles different pricing models across internal resources and public clouds (Aliyun, AWS).
Billing Model: Follows OpenCost specification standard. Core principle: allocate = Max(Usage, request). Base billing unit is 10 minutes, aligned with wall-clock time for stability.
Supported Resource Types: CPU, Memory, GPU, and more. Costs are calculated per POD by aggregating individual resource costs (CPU, memory, etc.).
Rich Filtering and Aggregation: Supports label-based filtering and aggregation by Namespace, Cluster, and POD labels.
Architecture Design Principles:
Low Intrusion: Uses sidecar-less, metrics-based collection approach
Reliability: 3+ replica deployment for ApiServer/etcd; Prometheus with dual backup; node failure has minimal impact
Scalability: Supports 100k+ PODs; uses ClickHouse for storage (~20GB/month for 120k PODs at 10min intervals)
Extensibility: Plugin-based billing logic for flexibility
Data Model:
Uses ClickHouse ReplacingMergeTree for efficient storage and fast retry capabilities:
CREATE TABLE IF NOT EXISTS kubecost.kube_billing_infos<br/>(<br/> create_time Int64 COMMENT 'record create time',<br/> start_time Int64 COMMENT 'billing start time',<br/> end_time Int64 COMMENT 'billing end time',<br/> item String COMMENT 'billing item, example: cpu, mem, gpu, etc',<br/> cost Float64 COMMENT 'billing cost',<br/> currency String COMMENT 'billing currency',<br/> entity_primary_key String COMMENT 'entity primary key, cluster/namespace/pod/container',<br/> usage_info Map(String, Float64) COMMENT 'etc:usage,request,allocate',<br/> label_info Map(String, String) COMMENT 'basic labels',<br/> price_info String COMMENT 'cost price info'<br/>) Engine = ReplacingMergeTree(create_time)<br/> PARTITION BY toYYYYMM(FROM_UNIXTIME(start_time))<br/> ORDER BY (start_time, end_time, item, entity_primary_key)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
