WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores Accepted at HPCA 2025
Ant Group’s Computing Systems Lab announced that its GPU‑accelerated fully homomorphic encryption framework WarpDrive, which exploits Tensor and CUDA cores for high‑throughput NTT operations and parallel kernel designs, has been accepted as a paper at the IEEE HPCA 2025 conference.
Recently, the top computer architecture conference HPCA 2025 announced its accepted papers, and Ant Group’s Computing Systems Lab’s latest research result "WarpDrive" was included.
International Symposium on High-Performance Computer Architecture (HPCA), organized by IEEE, covers processor architecture, parallel computing, storage systems, and is regarded as a benchmark for computer architecture research. Public information shows that the number of papers with first authors from mainland China accepted by HPCA each year is less than ten.
Paper title: WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores
WarpDrive is a GPU‑based FHE acceleration solution. The paper first proposes an efficient Tensor‑Core‑based NTT implementation method, using deep computation decomposition and fine‑grained warp‑level memory access design, which reduces the number of instructions required for NTT and significantly lowers pipeline stalls.
Compared with the previous state‑of‑the‑art work TensorFHE, WarpDrive achieves up to 13.3× higher NTT throughput .
Building on this, WarpDrive introduces an NTT implementation framework that integrates both CUDA‑Core and Tensor‑Core solutions. Within this framework, the paper presents two CUDA‑Core NTT kernels and two hybrid kernels, achieving the first parallel use of both compute units in NTT operations, further boosting performance.
Additionally, the paper proposes a parallel‑enhanced kernel design (PE Kernel) that fully exploits intra‑ciphertext parallelism, allowing multiple RNS polynomials to be expanded within a GPU kernel. Using the CKKS algorithm as an example, implementation and evaluation on an NVIDIA A100 GPU show that the PE Kernel can increase compute utilization by 1.13–1.87× and memory utilization by 1.20–2.12×. Combined with the NTT and polynomial operation optimizations, WarpDrive’s homomorphic operation and application load performance improve up to 3.5× and 2.8× respectively over TensorFHE.
HPCA 2025 will be held from March 1 to March 5, 2025 (US Pacific Time) in Las Vegas, USA, where the first author of the Ant Group paper will give a presentation on site.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.