How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

Baidu Intelligent Cloud’s seventh‑generation AI confidential virtual machine combines Intel TDX, NVIDIA GPUs, and BlueField DPUs to deliver end‑to‑end encrypted data paths, elastic multi‑GPU scaling, and near‑native performance, proving that high‑sensitivity AI workloads can run securely in the cloud without sacrificing speed.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

1. AI Confidential Computing: Making Cloud Migration Secure

Enterprises are shifting from resource acquisition to trust establishment when moving compute and data to the cloud. Traditional perimeter‑based security no longer applies, and the core question becomes whether data remains controllable during use. Confidential Computing (Confidential Computing) builds a Trusted Execution Environment (TEE) at the hardware level, isolating security boundaries from the system perimeter to the compute itself.

Intel TDX Integration

Intel® Xeon® processors embed Trust Domain Extensions (TDX) to create hardware‑level TEEs. Combined with remote attestation, TDX ensures that data in use stays protected, providing dual remote verification (TDX + GPU Confidential Computing).

2. The 7th‑Gen AI Confidential VM: From Single‑Point to Full‑Stack

2.1 Limitations of the 6th‑Gen VM

The previous generation supported only a single GPU block, suitable for small models (7B/13B) and lacked DPU‑based I/O offload. Consequently, network and storage I/O consumed CPU resources, limiting performance and elasticity.

2.2 Breakthroughs in the 7th‑Gen VM

The new VM introduces:

Full‑link confidential computing: CPU TDX + GPU Confidential Computing + PCIe encryption.

Elastic multi‑GPU scaling via NVLink/NVSwitch for high‑speed interconnect.

All‑resource offload: DPU handles I/O, freeing CPU and delivering complete resource delivery.

Trusted verification with dual remote attestation.

Out‑of‑the‑box environment pre‑installed with the latest LTS kernel, drivers, and CUDA.

These advances enable end‑to‑end encrypted data transfer between CPU and GPU, supporting large‑scale, high‑sensitivity AI workloads.

3. Building Full‑Stack Trust

3.1 DPU vDPA: Balancing Performance and Elastic Scheduling

Traditional VFIO passthrough offers near‑bare‑metal performance but prevents live migration. vDPA provides hardware‑accelerated data paths while keeping control logic in the virtualization layer, achieving a decoupled design that maintains both performance and elasticity.

Implementation uses BlueField DPU with vhost‑vDPA and virtio‑full‑emulation modules, communicating with QEMU via vhost‑user. VFIO manages device resources, while vDPA accelerates the data path, reducing VM exits and improving I/O‑intensive workload performance.

3.2 Trusted Link: Resolving the Private‑Shared Memory Conflict

In TDX, memory is marked as private or shared at the address level. Mis‑labeling shared regions (e.g., the notify region) triggers TDX access violations, breaking device functionality. Baidu’s firmware (TDVF) now marks critical I/O memory as shared during boot and enforces controlled data handling, eliminating boundary leakage.

3.3 Protected PCIe for GPU Communication

Introducing NVIDIA Protected PCIe (PPCIe) encrypts the link between CPU trust domain and GPU, preventing plaintext exposure on the PCIe bus and extending confidentiality from a single point to the entire data path.

3.4 Address‑Space Management for Multi‑GPU

High‑performance GPUs require large BAR windows (up to 64 GB). Traditional 32‑bit firmware cannot directly access these regions, leading to VM exits and performance penalties. Baidu’s firmware now supports 64‑bit MMIO access and page‑per‑vq optimizations, ensuring correct notify region handling even in high‑address spaces.

3.5 Compatibility Fixes Contributed to QEMU

Memory region lookup logic was enhanced to correctly locate and handle high‑address BAR windows, fixing notify‑region failures. The patches have been submitted to the QEMU community (commit ffa8a3e3… and commit 55fa4be6…).

4. Core Performance Evaluation of the 7th‑Gen AI Confidential VM

4.1 Memory Performance

Enabling TDX results in memory bandwidth and latency virtually identical to a regular VM, with only negligible fluctuations within expected bounds.

4.2 I/O Performance

Virtio disk and network devices, protected by TDX, show performance on par with standard KVM VMs, making storage‑ and network‑intensive workloads safe to deploy.

4.3 GPU Performance

Typical GEMM workloads achieve ~99 % of native performance. While host‑to‑device (H2D) and device‑to‑host (D2H) bandwidth varies by GPU model, users can select appropriate GPUs to balance security and transfer efficiency.

5. Constructing Trusted AI Compute Under Constraints

The evolution from CPU‑only trust to full‑stack GPU trust redefines the data usage paradigm in AI computing. By carefully balancing security boundaries with performance requirements, Baidu Intelligent Cloud provides a robust infrastructure for high‑sensitivity, high‑compute AI workloads.

For the full list of technical references, see the QEMU commits: https://gitlab.com/qemu-project/qemu/-/commit/ffa8a3e3b2e6ff017113b98d500d6a9e05b1560a https://gitlab.com/qemu-project/qemu/-/commit/55fa4be6f76a3e1b1caa33a8f0ab4dc217d32e49

AIsecurityvirtualizationcloudConfidential Computing
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.