Cloud Native 14 min read

Baidu Intelligent Cloud GPU Container Virtualization 2.0: Advancements and Full-Scenario Practices

Baidu Intelligent Cloud’s GPU Container Virtualization 2.0 combines user‑mode and kernel‑mode isolation in a dual‑engine design that unifies scheduling of AI compute, rendering and encoding, supports mixed deployment and multi‑scheduler integration, and boosts GPU utilization across inference, offline tasks, autonomous‑driving simulation, and cloud‑gaming workloads.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Baidu Intelligent Cloud GPU Container Virtualization 2.0: Advancements and Full-Scenario Practices

The presentation introduces Baidu Intelligent Cloud's GPU container virtualization 2.0, a dual-engine architecture that combines user-mode and kernel-mode isolation to meet diverse requirements for isolation, performance, and efficiency.

It begins by outlining the evolution from version 1.0, which already supported AI workloads via resource pooling and K8s-based scheduling with mixed deployment patterns such as shared, preemptive, time-sharing, and tidal mixing.

Version 2.0 adds isolation of GPU rendering and encoding capabilities, enabling unified scheduling of AI compute, rendering, and encoding resources. It also supports multi-scheduler integration, allowing customers to blend existing task schedulers with Baidu's platform within a single Kubernetes cluster.

The talk details technical implementation: user-mode engine leverages latest NVIDIA drivers and CUDA 12.1, while kernel-mode engine supports driver versions 525/530/535. Resource pooling decouples hardware via remote calls, and the K8s scheduling layer abstracts various mixing strategies.

Performance demonstrations show that AI and rendering workloads can each obtain approximately 50% of GPU power when co‑running, delivering expected throughput and FPS. Encoding isolation in kernel mode enables fine‑grained allocation of encoding power, e.g., 20% assignment yields 20% encoding capability.

Practice scenarios are presented: online inference benefits from user‑mode low‑latency scheduling, raising GPU utilization from 20% to 35%; offline tasks use preemptive/time‑sharing mixing to fill idle cycles. Development workloads gain multi‑user isolation via kernel‑mode, supporting burst scheduling and shared storage.

Additional use cases include autonomous driving simulation (simultaneous rendering and inference on a single GPU, doubling utilization) and cloud gaming on ARM platforms, where kernel‑mode memory isolation ensures QoS without violating real‑time constraints.

Overall, GPU container virtualization 2.0 achieves fine‑grained, unified scheduling of all GPU resources—compute, memory, rendering, and encoding—maximizing utilization and enabling diverse AI‑centric and graphics‑intensive workloads on cloud‑native infrastructure.

Cloud NativeGPU virtualizationContainer OrchestrationAI workloadsencoding isolationmulti‑schedulerrendering isolation
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.