Operations 21 min read

How Bilibili Scales Capacity: VPA, HPA, and Cost‑Saving Strategies

This article summarizes Zhang He’s Bilibili SRE talk on building a capacity‑management system that visualizes resource usage, reduces costs, improves stability, and leverages Kubernetes VPA, HPA, pooling, and quota management to support massive live‑stream events and rapid feature releases.

dbaplus Community

Jun 24, 2023

How Bilibili Scales Capacity: VPA, HPA, and Cost‑Saving Strategies

Design Philosophy and Motivation

Capacity management at Bilibili targets three fundamental problems:

Visibility: cluster, resource‑pool and node water‑marks are not exposed, making stability hard to guarantee.

Root‑cause tracing: frequent code, configuration and traffic‑shift changes obscure when and why capacity variations occur.

Autoscaling coverage: many bursty activities exceed the limits of existing Horizontal Pod Autoscaler (HPA) configurations.

Key Challenges

Internal dynamics – code releases, config updates, load‑testing, multi‑active traffic routing and cache expiration constantly reshape capacity models.

External spikes – live events, promotions and viral topics generate unpredictable traffic bursts.

Multiple bottlenecks – long service chains (upstream, downstream, middleware) make early detection difficult.

Manual emergency response – reliance on human intervention leads to long recovery times and risk of cascading failures.

Architecture Overview

The platform is built from the bottom up and consists of four layers:

Basic Capacity : collects metrics for clusters, resource pools, nodes and application profiles.

Elastic Resources : implements Vertical Pod Autoscaler (VPA) and HPA, adds pooling and quota controls, and provides visual dashboards.

PaaS Pooling : merges physical pools (comics, live, OGV) into a logical pool, shifting focus to logical quota management.

Quota Management : issues quota policies, integrates with the internal CMDB and ties over‑usage to the release platform for automatic throttling.

VPA‑Based Elastic Scaling

Each service defines a soft limit (recommended) and a hard limit (maximum). VPA uses real‑time CPU usage and service‑level profiles to compute request values, reducing over‑provisioning.

# VPA pipeline components
Generator   – expands high‑level rules (e.g., L0 tier) into individual VPA objects per service.
Recommender – pulls metrics (CPU usage, P99 latency, etc.) from the monitoring system and calculates optimal request values.
Updater     – patches the Pod spec with the new request values.
Webhook     – listens to deployment events and triggers a resource adjustment if needed.

During large‑scale events, non‑critical services (e.g., L2/L3 back‑ends) can have their soft limits lowered to free resources for core services.

Strategy Management

Metric management – configure which metric (CPU max, CPU P99, memory, QPS) drives the recommendation.

Template management – maintain per‑tier templates (L0, L1, L2…) that encode service‑type characteristics.

Pre‑estimation & A/B testing – simulate strategy impact before rollout.

Data Operations

Coverage dashboards – show pool‑wide VPA adoption rate and per‑service adjustment magnitude.

Execution logs – record each recommendation and its applied result for audit.

Strategy analysis – compare pre‑estimation with actual outcomes to refine templates.

Blacklist & Alerting

A blacklist excludes high‑risk services (e.g., those under heavy load tests or newly released features) from VPA adjustments during unexpected spikes.

Alerting monitors failure rate, coverage ratio and redundancy; alerts are routed to SRE and platform owners when VPA actions deviate from expectations.

PaaS Pooling Implementation

Physical pools for comics, live streaming and OGV are unified into a logical pool. The rollout follows three concrete steps:

Standardized governance : remove special constraints, unify kernel versions, disable nolimit bindings, normalize logs and cpuset settings.

Platform support : introduce logical quota objects per organization, enforce quota limits on the merged pool, and extend VPA coverage to the pooled resources.

Executive endorsement : secure top‑down commitment to coordinate cross‑department resource sharing.

Quota Management Integration

The capacity platform publishes quota policies to an internal CMDB‑backed business tree. Each organization receives a quota allocation; excess usage triggers the release platform to throttle or reject further scaling attempts.

HPA Design and Observability

HPA mirrors VPA concepts and adds horizontal scaling capabilities.

Policy management : define per‑tier thresholds (e.g., L0 services expand when CPU > 30%). Metrics include CPU, memory and QPS.

Elastic pre‑check : before scaling, verify downstream capacity (DB connection pools, TiDB, caches, message queues) to avoid overload.

Observability : track coverage rate, scaling quality and instance count; dashboards display bulk enable/disable, coverage percentages and current replica numbers.

Alerting : generate alerts for scaling failures, abnormal HPA behavior, or downstream bottlenecks.

Capacity Inspection and Protection

Regular inspections visualize risk‑prone services, usage rates and quota health for developers, platform teams and SRE. An event‑driven pipeline aggregates changes from the release platform, HPA, and node management, enabling rapid root‑cause analysis of capacity variations.

Operational Dashboards

Basic capacity charts – cluster, pool, node and application metrics.

Business‑level views – usage trends, hot services and pain points.

Capacity event streams – link platform actions (e.g., releases, scaling events) to resource changes.

Weekly reports – department‑specific and internal summaries of usage, efficiency gains and stability risks.

Achieved Benefits

No new physical machines were added for online PaaS workloads in the first half of 2022.

Zero additional procurement for large‑scale events (S12) thanks to pooled resources and VPA/HPA elasticity.

Event support capacity grew >10× while provisioning time dropped from weeks to hours.

Smaller services experienced reduced outage risk due to larger, more distributed pools.

Urgent scaling needs (blue‑green releases, HPA oversell) are satisfied within minutes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability kubernetes SRE cost optimization capacity management HPA VPA resource pooling

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.