Artificial Intelligence 6 min read

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The article introduces Nimbus, a novel two‑party privacy‑preserving inference framework for Transformer models that accelerates linear‑layer matrix multiplication and activation‑function evaluation through an outer‑product encoding and distribution‑aware polynomial approximation, achieving 2.7‑4.7× speedup over prior work while maintaining model accuracy.

AntTech
AntTech
AntTech
Nimbus: Secure and Efficient Two‑Party Inference for Transformers

NeurIPS 2024, one of the top conferences in artificial intelligence, featured a paper titled "Nimbus: Secure and Efficient Two‑Party Inference for Transformers" authored by Ant Group’s Yiyu team in collaboration with Shanghai Jiao Tong University.

The paper proposes Nimbus, a two‑party secure inference framework specifically designed for Transformer neural networks, aiming to protect both model and user data privacy while delivering high‑performance inference suitable for large‑model scenarios.

Linear Layer – Efficient Matrix Multiplication via Outer‑Product Encoding

Traditional two‑party inference for the linear layers of Transformers relies on homomorphic encryption, incurring costs from encryption/decryption, matrix multiplication, and ciphertext communication. Nimbus redesigns the multiplication protocol to eliminate input‑ciphertext communication by exploiting static parameters, and introduces an outer‑product based encoding that dramatically reduces computation and output‑ciphertext communication, yielding a substantial efficiency gain.

Non‑Linear Layer – Distribution‑Aware Piecewise Polynomial Approximation

For activation functions such as softmax (exponential) and GELU, existing secure methods fit piecewise polynomials assuming a uniform input distribution, leading to unnecessary high‑degree polynomials. Nimbus observes that Transformer activation inputs follow a highly non‑uniform distribution (e.g., 85% of GELU inputs are negative), and therefore allocates finer polynomial approximations to high‑probability intervals while tolerating larger errors in low‑probability regions, reducing both polynomial degree and ciphertext operations.

Experimental Results

Implemented on the Secretflow‑SPU platform, Nimbus was evaluated on Transformers of various sizes and input lengths. Compared with the recent BumbleBee system (NDSS 2024), Nimbus achieved 2.7‑4.7× overall speedup, with matrix multiplication accelerated by 2.9‑12.5× and activation functions by 2.9‑4.0×, all while preserving model accuracy.

For more details, the SPU codebase is available at https://github.com/secretflow/spu .

Live‑Stream Deep Dive

A live online session titled "Yiyu Live #25x Paper Show #11" will be held on December 5, jointly by the Yiyu open‑source community and Ant Technology Research Institute to provide an in‑depth walkthrough of the paper; interested readers are invited to reserve a spot.

transformersPrivacy-Preserving AIcryptographysecure inferencesecret sharingtwo-party computation
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.