Operations 24 min read

How Bilibili Cut IT Costs by Billions with FinOps: A Deep Dive

This article details Bilibili's FinOps-driven approach to IT cost management, covering budgeting, resource utilization dashboards, financial metrics, cost modeling, SKU pricing, and operational practices that together saved the company hundreds of millions of yuan while supporting continued growth.

dbaplus Community
dbaplus Community
dbaplus Community
How Bilibili Cut IT Costs by Billions with FinOps: A Deep Dive

Background

Large internet companies need to reduce IT cost while supporting growth. Bilibili launched a FinOps practice in 2022 to gain cost insight, perform technical optimization and operational improvements, saving billions of yuan.

Budget Process Issues

Technical platforms are involved late in budgeting, receiving resource requirements after business submits budgets.

Budget‑approved items become white‑list, bypassing procurement checks and weakening cost control.

Business lacks a unified view of all bills, reducing cost awareness and motivation.

Resource Utilization Dashboard

A data warehouse aggregates monitoring, asset‑management and hybrid‑cloud HCRM data to produce dashboards for bandwidth (CDN, cloud, IDC) and compute (servers, VMs, bare metal). Metrics are normalized across vendors; GPU usage is split by training vs inference. Dashboards support water‑mark management, efficiency measurement and alerting.

FinOps Framework

FinOps (FinOps Foundation) aligns engineering, finance and business to make data‑driven spending decisions. Roles include executive leadership, business owners, engineering/operations, finance & procurement, and FinOps practitioners. The lifecycle consists of cost insight → cost optimization → cost operations.

Experimental Path

Quantify cost to raise business awareness.

Run technical and operational cost‑reduction in parallel.

Embed cost metrics into project planning, procurement and lifecycle management.

Cost Modeling

CAPEX (capital expenditure) and OPEX (operating expense) are distinguished. CAPEX is converted to OPEX via a Total Cost of Ownership (TCO) model that spreads one‑time hardware cost over its useful life. Example monthly server TCO formula:

Server_TCO_month = (CAPEX / lifespan_months) + Depreciation + Net + IDC + Line + f1

where Depreciation is monthly depreciation, Net is network‑equipment depreciation, IDC covers rack, power and maintenance, Line is monthly inter‑rack line fee, and f1 accounts for other per‑server monthly costs.

SKU pricing follows a cloud‑like model: unit_price = SKU_TCO / theoretical_service_capacity. Usage is classified as shared or exclusive , influencing charge calculation.

Cost Optimization Strategies

1. Bandwidth

Adopt narrow‑band HD and AV1 encoding to lower bitrate.

Use machine‑learning‑driven transcoding prediction.

Increase share of cheap PCDN/mCDN.

Build dedicated CDN lines to reduce origin traffic.

Layer content and route cold content to edge nodes.

Peak‑shaving and bandwidth sharing across services.

2. Server

Accelerate hardware refresh (Intel Skylake → Cascadelake → Icelake → Sapphire Rapids; AMD Rome → Milan → Genoa; GPU generations) to lower cost per compute unit.

Virtualize and consolidate via Kubernetes‑based private‑cloud containers.

Increase pool‑level resource volume, poolability and allocation rates.

Apply VPA/HPA, over‑commit and workload mixing to improve utilization.

3. Public Cloud

Choose billing mode based on workload pattern: bandwidth‑based for stable traffic, traffic‑based for bursty workloads.

Select appropriate network contracts (static BGP, ISP lines, hybrid‑cloud dedicated lines) according to stability and cost.

Plan capacity, release unused instances, and evaluate self‑built vs SaaS services.

4. Optimization Loop

Platform generates bills → business reviews → bill analysis → targeted optimization → results reflected in the next billing cycle, forming a closed loop.

Operational Collaboration

Two pillars:

Cost operations: budget control, variance analysis, bill review, cost modeling, decision making.

Resource operations: monitor efficiency, retire idle assets, enforce containerization, clean up unused cloud resources.

Regular communication among executives, product owners, platform engineers, finance and procurement ensures swift remediation of cost anomalies.

Implementation Details

The billing system integrates public‑cloud and private‑cloud cost data, assigns each resource to an APPID from the CMDB, and enables multi‑dimensional cost views (project, department, business). Cost is calculated as Cost = unit_price × usage. Usage calculation distinguishes shared (limit‑based) and exclusive (capacity‑based) resources.

Example shared‑usage formula: Cost_shared = Price_cpu * Usage_shared * t Example exclusive‑usage formula:

Cost_exclusive = Price_cpu * (Capacity * loadfactor) * t

Charging exclusive pools on their theoretical maximum capacity encourages migration to shared pools, improving overall utilization.

Outcome

FinOps enabled Bilibili to achieve a full cost‑insight → cost‑optimization → cost‑operation loop, saving billions of yuan while supporting user growth. Future work focuses on real‑time data‑driven decisions, cost forecasting and further refinement of the FinOps practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Resource OptimizationFinOpsIT cost managementbudgeting
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.