How Bilibili Cut IT Costs by Billions with FinOps: A Deep Dive
This article details Bilibili's FinOps-driven approach to IT cost management, covering budgeting, resource utilization dashboards, financial metrics, cost modeling, SKU pricing, and operational practices that together saved the company hundreds of millions of yuan while supporting continued growth.
Background
Large internet companies need to reduce IT cost while supporting growth. Bilibili launched a FinOps practice in 2022 to gain cost insight, perform technical optimization and operational improvements, saving billions of yuan.
Budget Process Issues
Technical platforms are involved late in budgeting, receiving resource requirements after business submits budgets.
Budget‑approved items become white‑list, bypassing procurement checks and weakening cost control.
Business lacks a unified view of all bills, reducing cost awareness and motivation.
Resource Utilization Dashboard
A data warehouse aggregates monitoring, asset‑management and hybrid‑cloud HCRM data to produce dashboards for bandwidth (CDN, cloud, IDC) and compute (servers, VMs, bare metal). Metrics are normalized across vendors; GPU usage is split by training vs inference. Dashboards support water‑mark management, efficiency measurement and alerting.
FinOps Framework
FinOps (FinOps Foundation) aligns engineering, finance and business to make data‑driven spending decisions. Roles include executive leadership, business owners, engineering/operations, finance & procurement, and FinOps practitioners. The lifecycle consists of cost insight → cost optimization → cost operations.
Experimental Path
Quantify cost to raise business awareness.
Run technical and operational cost‑reduction in parallel.
Embed cost metrics into project planning, procurement and lifecycle management.
Cost Modeling
CAPEX (capital expenditure) and OPEX (operating expense) are distinguished. CAPEX is converted to OPEX via a Total Cost of Ownership (TCO) model that spreads one‑time hardware cost over its useful life. Example monthly server TCO formula:
Server_TCO_month = (CAPEX / lifespan_months) + Depreciation + Net + IDC + Line + f1where Depreciation is monthly depreciation, Net is network‑equipment depreciation, IDC covers rack, power and maintenance, Line is monthly inter‑rack line fee, and f1 accounts for other per‑server monthly costs.
SKU pricing follows a cloud‑like model: unit_price = SKU_TCO / theoretical_service_capacity. Usage is classified as shared or exclusive , influencing charge calculation.
Cost Optimization Strategies
1. Bandwidth
Adopt narrow‑band HD and AV1 encoding to lower bitrate.
Use machine‑learning‑driven transcoding prediction.
Increase share of cheap PCDN/mCDN.
Build dedicated CDN lines to reduce origin traffic.
Layer content and route cold content to edge nodes.
Peak‑shaving and bandwidth sharing across services.
2. Server
Accelerate hardware refresh (Intel Skylake → Cascadelake → Icelake → Sapphire Rapids; AMD Rome → Milan → Genoa; GPU generations) to lower cost per compute unit.
Virtualize and consolidate via Kubernetes‑based private‑cloud containers.
Increase pool‑level resource volume, poolability and allocation rates.
Apply VPA/HPA, over‑commit and workload mixing to improve utilization.
3. Public Cloud
Choose billing mode based on workload pattern: bandwidth‑based for stable traffic, traffic‑based for bursty workloads.
Select appropriate network contracts (static BGP, ISP lines, hybrid‑cloud dedicated lines) according to stability and cost.
Plan capacity, release unused instances, and evaluate self‑built vs SaaS services.
4. Optimization Loop
Platform generates bills → business reviews → bill analysis → targeted optimization → results reflected in the next billing cycle, forming a closed loop.
Operational Collaboration
Two pillars:
Cost operations: budget control, variance analysis, bill review, cost modeling, decision making.
Resource operations: monitor efficiency, retire idle assets, enforce containerization, clean up unused cloud resources.
Regular communication among executives, product owners, platform engineers, finance and procurement ensures swift remediation of cost anomalies.
Implementation Details
The billing system integrates public‑cloud and private‑cloud cost data, assigns each resource to an APPID from the CMDB, and enables multi‑dimensional cost views (project, department, business). Cost is calculated as Cost = unit_price × usage. Usage calculation distinguishes shared (limit‑based) and exclusive (capacity‑based) resources.
Example shared‑usage formula: Cost_shared = Price_cpu * Usage_shared * t Example exclusive‑usage formula:
Cost_exclusive = Price_cpu * (Capacity * loadfactor) * tCharging exclusive pools on their theoretical maximum capacity encourages migration to shared pools, improving overall utilization.
Outcome
FinOps enabled Bilibili to achieve a full cost‑insight → cost‑optimization → cost‑operation loop, saving billions of yuan while supporting user growth. Future work focuses on real‑time data‑driven decisions, cost forecasting and further refinement of the FinOps practice.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
