Big Data 15 min read

Data Cost Quantification, Billing, and Optimization in a Data Platform

The data‑platform team introduced a self‑sustaining cost‑reduction framework that quantifies CPU, memory, and disk expenses using price‑per‑resource formulas, applies time‑weighted billing, generates multi‑level reports, and drives optimization through six actionable “swords” and incentive‑based operations, achieving roughly 17 % offline‑cluster savings within six months.

Youzan Coder
Youzan Coder
Youzan Coder
Data Cost Quantification, Billing, and Optimization in a Data Platform

1. Introduction

The article describes a data middle‑platform team’s effort to control rapidly growing compute and storage costs. After six months of business growth, resource consumption doubled, prompting a systematic approach to cost awareness and reduction.

2. Overall Approach

The team proposes a long‑term, self‑sustaining cost‑reduction mechanism with six key requirements: cost quantification, waste perception, ease of reduction, traceable processes, incentive mechanisms, and operational governance.

3. Cost Quantification

Cost is modeled as resource_price * resource_consumption . The main resources are CPU, memory, and disk. The unit price of each resource is calculated using the following formulas:

cpu_price = total_cost * cpu_ratio / (total_cpu * load_factor)

memory_price = total_cost * memory_ratio / (total_memory * load_factor)

disk_price = total_cost * disk_ratio / (total_disk * load_factor)

Variables:

total_cost : total hardware investment for the data platform.

total_cpu , total_memory , total_disk : aggregate hardware capacity.

cpu_ratio , memory_ratio , disk_ratio : cost‑share ratios derived from market scarcity.

load_factor : effective utilization factor (e.g., 0.8).

3.1 Resource Consumption

Three categories of consumption are considered:

Disk storage (including replication, e.g., HDFS 3‑copy).

CPU and memory usage, measured in cpu_seconds and memory_seconds .

Time of execution, enabling time‑weighted billing.

Formulas for consumption:

disk = data_size * replicator

cpu = cpu_seconds * (1 + loss_factor)

memory = memory_seconds * (1 + loss_factor)

loss_factor accounts for resource allocation overhead (0 for YARN‑based collection, >0 for Spark Thrift Server).

3.2 Time‑Weighted Billing

The cluster load is divided into three time slots with different weights (w1, w2, w3) satisfying 0.6*w1 + 0.3*w2 + 0.1*w3 = 1 and w1 > w2 > w3 . Example weights: w1=1.2 (golden), w2=0.8 (silver), w3=0.4 (bronze). Weighted CPU consumption is computed as:

cpu_weight = (cs1*w1 + cs2*w2 + cs3*w3) / cpu_seconds

Analogous weighting is applied to memory.

3.3 Data Cost Calculation

Final cost components are:

cpu_cost = cpu_price * (cpu * cpu_weight)

memory_cost = memory_price * (memory * memory_weight)

disk_cost = disk_price * disk

Additional considerations include cost allocation among multiple output tables, ownership attribution, and handling of ad‑hoc queries.

4. Cost Billing

Billing reports are generated at global, department, and individual levels, covering total cost overview, trend analysis, high‑cost/top‑time tables, savings summary, and value metrics (usage count, business impact).

5. Cost Optimization (Six “Swords”)

Decommission unused data (offline).

Delay start of non‑critical jobs to off‑peak periods.

Reduce job frequency (e.g., hourly to daily).

Replace legacy or duplicate pipelines with more efficient alternatives.

Task‑level tuning (e.g., Hive skew, SQL rewrite).

Merge small files (use Hive’s file‑merge strategy for Spark jobs).

Additional tactics include leveraging Hive cubes, providing registration for manual cost‑saving actions, and building dashboards for monitoring.

6. Cost‑Reduction Operations

The operation loop follows four principles: promotion, “nudge”, feedback, and incentives. Activities include displaying cost metrics on dashboards, regular reminders in meetings, sending personalized cost statements, targeting high‑cost owners, organizing weekly optimization sprints, collecting user feedback, and rewarding top savers with internal tokens.

7. Summary & Outlook

After six months, 40 participants performed 660 cost‑saving actions, reducing offline cluster spend by ~17% (over 20% of savings were self‑initiated). Future work will focus on finer‑grained operation, extending cost governance beyond offline clusters, attributing cost to business lines, and building a data‑value assessment framework.

Big Datadata platformcost optimizationbillingData Costresource quantification
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.