How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute
Baidu Intelligent Cloud’s seventh‑generation AI confidential virtual machine combines Intel TDX, NVIDIA GPUs, and BlueField DPUs to deliver end‑to‑end encrypted data paths, elastic multi‑GPU scaling, and near‑native performance, proving that high‑sensitivity AI workloads can run securely in the cloud without sacrificing speed.
1. AI Confidential Computing: Making Cloud Migration Secure
Enterprises are shifting from resource acquisition to trust establishment when moving compute and data to the cloud. Traditional perimeter‑based security no longer applies, and the core question becomes whether data remains controllable during use. Confidential Computing (Confidential Computing) builds a Trusted Execution Environment (TEE) at the hardware level, isolating security boundaries from the system perimeter to the compute itself.
Intel TDX Integration
Intel® Xeon® processors embed Trust Domain Extensions (TDX) to create hardware‑level TEEs. Combined with remote attestation, TDX ensures that data in use stays protected, providing dual remote verification (TDX + GPU Confidential Computing).
2. The 7th‑Gen AI Confidential VM: From Single‑Point to Full‑Stack
2.1 Limitations of the 6th‑Gen VM
The previous generation supported only a single GPU block, suitable for small models (7B/13B) and lacked DPU‑based I/O offload. Consequently, network and storage I/O consumed CPU resources, limiting performance and elasticity.
2.2 Breakthroughs in the 7th‑Gen VM
The new VM introduces:
Full‑link confidential computing: CPU TDX + GPU Confidential Computing + PCIe encryption.
Elastic multi‑GPU scaling via NVLink/NVSwitch for high‑speed interconnect.
All‑resource offload: DPU handles I/O, freeing CPU and delivering complete resource delivery.
Trusted verification with dual remote attestation.
Out‑of‑the‑box environment pre‑installed with the latest LTS kernel, drivers, and CUDA.
These advances enable end‑to‑end encrypted data transfer between CPU and GPU, supporting large‑scale, high‑sensitivity AI workloads.
3. Building Full‑Stack Trust
3.1 DPU vDPA: Balancing Performance and Elastic Scheduling
Traditional VFIO passthrough offers near‑bare‑metal performance but prevents live migration. vDPA provides hardware‑accelerated data paths while keeping control logic in the virtualization layer, achieving a decoupled design that maintains both performance and elasticity.
Implementation uses BlueField DPU with vhost‑vDPA and virtio‑full‑emulation modules, communicating with QEMU via vhost‑user. VFIO manages device resources, while vDPA accelerates the data path, reducing VM exits and improving I/O‑intensive workload performance.
3.2 Trusted Link: Resolving the Private‑Shared Memory Conflict
In TDX, memory is marked as private or shared at the address level. Mis‑labeling shared regions (e.g., the notify region) triggers TDX access violations, breaking device functionality. Baidu’s firmware (TDVF) now marks critical I/O memory as shared during boot and enforces controlled data handling, eliminating boundary leakage.
3.3 Protected PCIe for GPU Communication
Introducing NVIDIA Protected PCIe (PPCIe) encrypts the link between CPU trust domain and GPU, preventing plaintext exposure on the PCIe bus and extending confidentiality from a single point to the entire data path.
3.4 Address‑Space Management for Multi‑GPU
High‑performance GPUs require large BAR windows (up to 64 GB). Traditional 32‑bit firmware cannot directly access these regions, leading to VM exits and performance penalties. Baidu’s firmware now supports 64‑bit MMIO access and page‑per‑vq optimizations, ensuring correct notify region handling even in high‑address spaces.
3.5 Compatibility Fixes Contributed to QEMU
Memory region lookup logic was enhanced to correctly locate and handle high‑address BAR windows, fixing notify‑region failures. The patches have been submitted to the QEMU community (commit ffa8a3e3… and commit 55fa4be6…).
4. Core Performance Evaluation of the 7th‑Gen AI Confidential VM
4.1 Memory Performance
Enabling TDX results in memory bandwidth and latency virtually identical to a regular VM, with only negligible fluctuations within expected bounds.
4.2 I/O Performance
Virtio disk and network devices, protected by TDX, show performance on par with standard KVM VMs, making storage‑ and network‑intensive workloads safe to deploy.
4.3 GPU Performance
Typical GEMM workloads achieve ~99 % of native performance. While host‑to‑device (H2D) and device‑to‑host (D2H) bandwidth varies by GPU model, users can select appropriate GPUs to balance security and transfer efficiency.
5. Constructing Trusted AI Compute Under Constraints
The evolution from CPU‑only trust to full‑stack GPU trust redefines the data usage paradigm in AI computing. By carefully balancing security boundaries with performance requirements, Baidu Intelligent Cloud provides a robust infrastructure for high‑sensitivity, high‑compute AI workloads.
For the full list of technical references, see the QEMU commits: https://gitlab.com/qemu-project/qemu/-/commit/ffa8a3e3b2e6ff017113b98d500d6a9e05b1560a https://gitlab.com/qemu-project/qemu/-/commit/55fa4be6f76a3e1b1caa33a8f0ab4dc217d32e49
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
