How HyperGPU Unlocks Secure GPU Power for Large AI Models

This article introduces HyperGPU, a confidential‑computing infrastructure that transforms ordinary GPUs into trusted execution environments for large‑model inference, covering its background, design goals, architecture, security mechanisms, performance results, future optimizations, and open‑source plans.

DataFunSummit
DataFunSummit
DataFunSummit
How HyperGPU Unlocks Secure GPU Power for Large AI Models

Background

Dr. Xu Jimin from Ant Group presented the need for confidential computing in data‑flow scenarios, highlighting data silos, privacy and security risks, and the emergence of trusted execution environments (TEE) such as MPC and TEE to protect data during processing.

HyperGPU Design Goals

Universality : Enable TEE capabilities on a wide range of existing GPUs without requiring new hardware features.

Usability : No modifications to user‑space applications; only system‑level software changes are needed, providing zero‑impact deployment.

Affordability : Upgrade ordinary compute resources to confidential‑computing capability at low cost.

Decoupling : Avoid vendor lock‑in by supporting multiple GPU vendors and allowing the trust root to be separated from hardware.

HyperGPU Architecture

The solution is a pure‑software, virtualization‑based TEE stack built on three layers:

L0: Controls hardware access and performs encrypted bus communication.

L1: Hosts the hypervisor that isolates the host OS from the trusted environment.

L2: Runs the confidential enclave (Enclave) and the confidential virtual machine (CVM) that execute protected workloads.

Data flows from the host OS to the L1 layer, then to the L2 enclave where plaintext computation occurs, while the L0 layer enforces encrypted access to GPU memory and I/O.

HyperGPU architecture diagram
HyperGPU architecture diagram

Trust Root Decoupling

Ant’s self‑developed TPM chip serves as a trusted root, compatible with domestic TPMs and certified by government authorities. The design abstracts the trust root through virtualization, supporting Intel, AMD, and domestic CPUs such as HaiGuang and Zhaoxin.

Security Design

HyperGPU isolates attacks at both control and data planes:

Control‑plane isolation : High‑privilege software attacks from L1 are blocked by L0; malicious CVM attempts to access protected data are intercepted during authentication.

Data‑plane isolation : Unauthorized CVM reads of GPU memory are prevented by L0; IOMMU manages DMA access to ensure only trusted devices can reach GPU memory.

Attack scenarios not covered (hardware attacks, side‑channel attacks, DoS) are mitigated by memory encryption and end‑to‑end ciphertext transmission.

GPU TEE Extension

Standard GPUs lack cryptographic roots, so HyperGPU introduces a device‑fingerprint mechanism and key‑agreement protocol to authenticate GPUs and encrypt the PCIe bus. The process includes:

Collecting a hardware fingerprint from the GPU.

Deriving a private key from the fingerprint for key agreement.

Establishing a session key for bus encryption (control‑plane and data‑plane).

Device fingerprint and key agreement
Device fingerprint and key agreement

Performance Evaluation

Code modifications are modest (≈1 000 lines in L0, ≈100 lines in L1). Installation requires the host OS first, then the HyperGPU system. Measured overheads are:

Inference performance loss ≈1 % (data conversion overhead).

Bus‑encryption overhead ≈7–8 % for H20 hardware and LLaMA models.

Device fingerprint collection ≈0.2 s; verification ≈0.5 s; key agreement ≈0.2 s.

Performance results
Performance results

Future Work and Outlook

Planned optimizations include reducing GPU‑side encryption overhead, improving bus‑encryption efficiency, offering differential security policies, and providing national‑cryptography compliance (SM2/SM3/SM4). The project also aims to broaden platform support (ARM, various OSes) and collaborate with cloud providers.

Open Source Plan

Ant’s privacy‑computing ecosystem (including HyperEnclave, HyperGPU, and related projects) will be open‑sourced later this year, enabling broader community adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TEEConfidential ComputingHyperGPUGPU security
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.