Tagged articles
2 articles
Page 1 of 1
Bilibili Tech
Bilibili Tech
Jun 4, 2024 · Big Data

Improving Resource Utilization and Isolation in Bilibili Big Data Clusters with the Amiya Over‑commit Component

By deploying the self‑developed Amiya over‑commit component together with kernel‑level cgroup memory isolation, explicit task priorities, OOM‑priority killing, and asynchronous reclamation, Bilibili’s big‑data clusters boosted daily resource utilization by about 15 %, eliminated DataNode OOM kills, cut memory‑reclaim latency to zero, and achieved a further 9 % overall efficiency gain.

OOM PriorityPerformance EvaluationResource Management
0 likes · 18 min read
Improving Resource Utilization and Isolation in Bilibili Big Data Clusters with the Amiya Over‑commit Component
Baidu Geek Talk
Baidu Geek Talk
Jul 18, 2022 · Artificial Intelligence

GPU Container Virtualization for AI Heterogeneous Computing: Architecture and Best Practices

The article surveys GPU container virtualization for AI heterogeneous computing, detailing utilization challenges, historical architectures, various virtualization methods, Baidu's dual-engine user- and kernel-space design with isolation and scheduling features, performance benefits, best‑practice scenarios, and deployment guidance, concluding with a technical Q&A.

AI computingGPU virtualizationMPS
0 likes · 30 min read
GPU Container Virtualization for AI Heterogeneous Computing: Architecture and Best Practices