Nimbus: Secure and Efficient Two‑Party Inference for Transformers
The article introduces Nimbus, a novel two‑party privacy‑preserving inference framework for Transformer models that accelerates linear‑layer matrix multiplication and activation‑function evaluation through an outer‑product encoding and distribution‑aware polynomial approximation, achieving 2.7‑4.7× speedup over prior work while maintaining model accuracy.