Bilibili Big Data Task Migration to Cloud‑Native Kubernetes Using Volcano Scheduler
This article shares Bilibili’s experience migrating its offline big‑data workloads to a cloud‑native Kubernetes environment using the Volcano scheduler, covering migration background, scheduler adaptation, hierarchical queue implementation, over‑commit framework (Amiyad), and future work to improve performance and resource utilization.
Migration Background
The migration aims to reduce costs through unified resource management, improve resource isolation with Kubernetes, ensure environment compatibility via containers, and increase operational efficiency by automating management on K8s.
Challenges
Two major challenges were addressed: (1) adapting compute components, including Spark client submission to K8s, task status and log collection, WebUI navigation, and Remote Shuffle Service integration; (2) adapting the scheduling engine, replacing Yarn with Volcano for batch workloads on K8s.
Why Volcano
Volcano was chosen because it offers optimized batch scheduling for large‑scale parallel jobs on K8s, has native support for custom schedulers in Spark 3.3.0, aligns well with big‑data concepts such as Queue and PodGroup, and provides an extensible plugin architecture.
Volcano Architecture
Volcano consists of three core components: the Scheduler (handles job and PodGroup placement via Actions and Plugins), the ControllerManager (manages CRD lifecycles like VCJob, PodGroup, Queue), and the Admission module (validates CRD resources).
Hierarchical Queue Adaptation
Initial adaptation added a first‑version hierarchical queue by caching queue events and constructing parent‑leaf relationships after all queues were discovered. The second version leveraged Volcano’s native support to implement hierarchical queues with proper queue‑level validation, priority handling, and resource balancing.
Other Adaptations
The Capacity plugin was extended to support hierarchical queues, allowing per‑queue resource specifications (MinCapacity, MaxCapacity) and eliminating the Guarantee concept. A custom Bigdata‑topology plugin was added to address pod‑level affinity for big‑data workloads.
Over‑Commit Framework Construction
Background
When migrating to K8s, memory and CPU utilization dropped dramatically. To keep utilization comparable to the offline cluster, an over‑commit framework (Amiyad) was built.
Amiyad Architecture
Amiyad comprises a Master/Worker model. The Master performs health checks, dynamic over‑commit decisions, and overall policy enforcement. The Worker runs on every node, collecting resource metrics, managing container runtime, and applying over‑commit adjustments.
Custom Resource
A new CRD, AmiyadExtendedResource , represents over‑committed resources. All over‑commit operations target this resource, and Volcano schedules based on it.
Components
StateStore Manager – aggregates node resource information.
Resource Manager – proposes over‑commit amounts and drives eviction based on resource health.
RuntimeHook Manager – implements pod‑level hooks to mutate resources and align them with AmiyadExtendedResource .
Benefits
Static 1.1× over‑commit increased 7‑day average memory utilization from 27.6% to 37.7% and CPU utilization from 28.3% to 33.3%, while keeping task eviction rates below 0.5%.
Future Outlook
Continued work will focus on deeper Volcano adaptation for big‑data semantics, improving batch scheduling performance, enhancing QoS support beyond simple priority‑based eviction, and refining over‑commit policies to maintain stable resource usage during large‑scale migrations.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.