Information Security 6 min read

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

AntTech
AntTech
AntTech
FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

As data‑security demands rise, fully homomorphic encryption (FHE) still faces computational performance challenges. At ISCA 2025, Ant Technology Research Institute presented two papers offering new solutions through hardware architecture optimization and compute‑unit improvements, markedly enhancing encrypted computation speed.

Paper 1: FAST – An FHE Accelerator for Scalable‑parallelism with Tunable‑bit introduces the FAST accelerator, which combines the latest cryptographic optimizations such as hoisting and a key‑switching decomposition method (KLSS). By dynamically supporting multiple key‑switching strategies and employing a scalable, precision‑tunable multiplier, FAST achieves a 1.8× performance gain over prior designs.

The core innovation lies in recognizing the distinct ciphertext‑level costs and precision requirements of different key‑switching methods, then designing a universal framework that adapts to them. The specialized hardware architecture and novel data‑organization further exploit cryptographic optimizations.

As the first accelerator to simultaneously support hoisting and KLSS, FAST demonstrates the strong potential of co‑optimizing cryptography and hardware design, offering new directions for secure computation research.

Paper 2: Neo – Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core presents a GPU‑based FHE acceleration framework that deeply optimizes the Tensor Core architecture. By reformulating key operators such as Base Conversion and Inner Product from element‑wise calculations into matrix multiplications and applying data‑reordering strategies, Neo raises data reuse and achieves up to 3.7× speed‑up for individual operators.

Unlike traditional INT8‑only solutions, Neo uniquely leverages the FP64 units of Tensor Core to accelerate high‑bit operations, reducing large‑integer segmentation by using 53‑bit precision, which yields a 1.65× throughput improvement.

Experimental results on an NVIDIA A100 GPU show Neo’s HMult and other critical operations surpass the state‑of‑the‑art TensorFHE by more than threefold, delivering an average 3.28× acceleration in real applications and providing a high‑performance, flexible solution for privacy‑preserving computing.

Paper Highlights

1. FAST accelerator uniquely supports cryptographic optimization techniques, delivering a 1.8× efficiency boost for FHE applications.

2. Neo’s GPGPU approach exploits memory‑access optimizations and the Tensor Core FP64 path to achieve a 3.28× overall speed‑up for fully homomorphic encryption.

The live session will feature the authors discussing design ideas and validation processes. It will be streamed on WeChat Channels (AntTech), Bilibili, and other platforms on May 22, 2025, 18:00‑20:00.

Live Viewing Guide

⏰ Time: 2025‑05‑22 18:00‑20:00

👀 Platforms: WeChat Video Channels (Ant Technology Research Institute, AntTech), Bilibili (Ant Technology Research Institute).

GPU computinghardware accelerationFully Homomorphic EncryptionTensor CoreCryptographic Optimization
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.