How Bilibili Boosted Server Utilization with Kubernetes Co‑Location Strategies
This article explains how Bilibili’s large‑scale Kubernetes cloud platform reduces costs and improves machine utilization by applying co‑location (mixed‑tenant) techniques, including resource‑aware scheduling, dynamic isolation, and a dedicated management console across online, offline, and idle‑machine scenarios.
Background
In large internet companies, server fleets can reach tens of thousands of machines. Under cost‑reduction pressure, improving resource utilization while preserving service SLOs is critical to lower procurement costs. Two main causes of low utilization in a Kubernetes cloud platform are over‑provisioned resource quotas and workload demand fluctuations.
To address over‑provisioning, the platform recommends reasonable resource configurations based on service profiling and applies elastic scaling. To exploit idle resources during demand valleys, the platform schedules offline tasks onto online clusters during off‑peak periods.
Co‑location Concept
Workloads are divided into online (latency‑sensitive micro‑services such as recommendation, advertising, search) and offline (batch jobs like video transcoding or MapReduce). Co‑location technology schedules and isolates different‑type workloads on the same physical machines while guaranteeing SLOs, thereby increasing utilization and reducing costs.
Co‑location is not merely placing containers on the same host; it requires a scheduler that selects nodes with sufficient free resources and isolation mechanisms to prevent interference with high‑priority online tasks.
Bilibili Co‑location Scenarios
1. Offline co‑location – Video transcoding jobs, which are compute‑intensive and can be retried, are scheduled onto online clusters during night‑time valleys, raising online cluster utilization and filling compute gaps for transcoding peaks.
Scheduling must ensure co‑location pods do not consume quota needed for online pods.
The scheduler must dynamically sense node‑level free co‑location resources.
2. Offline‑to‑offline co‑location – Idle machines in offline clusters (e.g., training platforms) can run heavier big‑data tasks. Integration with YARN requires coordination between Kubernetes and YARN schedulers.
3. Idle‑machine co‑location – Reserved IDC machines are automatically added to Kubernetes for co‑location tasks and withdrawn when needed for emergencies.
Offline Co‑location Implementation
Overall Architecture
Key components include:
Task submission module : "caster" for online services, "crm" for offline batch submissions, supporting multi‑cluster scheduling and quota control.
Kubernetes scheduling module : native kube‑scheduler for online tasks, custom "job‑scheduler" for offline tasks, and a webhook that converts pod resource requests to extended resources (e.g., caster.io/colocation-cpu).
Colocation agent on each node: calculates and reports available co‑location resources, enforces isolation, and reports metrics to Prometheus.
Colocation config manager : centrally manages policies, enabling dynamic updates and feature toggles.
Co‑location Task Scheduling
Native scheduler issues: co‑location pods consume native quota and lack load awareness.
Extended‑resource based scheduling: pods are labeled (e.g., caster.io/resource-type: colocation) and a webhook rewrites requests to extended resources.
Job‑scheduler selects nodes based on reported extended resources, using hash‑based ordering for homogeneous pods and caching pre‑selected nodes.
Co‑location Resource Calculation
Dynamic : real‑time online usage vs. safety water‑mark determines free co‑location amount.
Static : idle backup machines report a fixed amount.
Time‑based : policies can enable/disable co‑location during specific periods, with graceful eviction.
Online QoS Assurance
Task scheduling directs co‑location pods to nodes with sufficient free resources.
Resource isolation uses cgroup “co‑location big‑frame” with minimal CPU share, dynamic CPU quota adjustments, cpuset binding, and memory quota/oom_score_adj tuning.
Network bandwidth is limited via CNI adaptor and Linux tc.
Eviction mechanisms trigger when resource usage exceeds thresholds, with cooldown periods to avoid thrashing.
Offline‑to‑Offline Co‑location
Integrates YARN NodeManager as a DaemonSet on co‑location nodes. The colocation agent reports available resources to YARN RM, which then schedules big‑data tasks onto suitable nodes, applying remote shuffle and task‑size‑aware scheduling to minimize impact on non‑co‑location workloads.
Co‑location Management Platform
A web UI provides:
Strategy management: batch view and set node policies (safety water‑marks, hard limits), and assign nodes to groups.
Co‑location toggle: one‑click enable/disable per machine, with immediate task eviction.
Monitoring: per‑node and per‑group dashboards showing reported co‑location resources, task counts, and actual consumption.
Results
Most Bilibili cloud machines now participate in co‑location, achieving average CPU utilization of ~35% (peak ~55%). The shared compute power supports large‑scale video transcoding, AI moderation, and big‑data MapReduce, saving thousands of servers.
Conclusion
Bilibili’s Kubernetes‑based co‑location framework improves resource efficiency through non‑intrusive scheduling, fine‑grained isolation, and comprehensive observability. Future work includes kernel‑level isolation, unified scheduling, and further cost‑reduction optimizations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
