Boosting Secure AI: HAWK Accelerator and FHEFusion Compiler Break New Ground
This article highlights two cutting‑edge works from Ant Group’s research team—HAWK, a fixed‑word key decomposition switching accelerator that overcomes hardware challenges for FHE, and FHEFusion, a compiler framework that introduces operator fusion to dramatically speed CKKS‑based DNN inference—showcasing their designs, optimizations, and experimental gains.
Background
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, which is critical for privacy‑preserving AI, cloud, medical, and financial services. The growing demand for large‑scale FHE inference has motivated research on hardware‑software co‑optimization to reduce the high computational and memory costs of core FHE primitives.
HAWK: Fixed‑Word Key Decomposition Switching Accelerator
Key switching is a fundamental operation in FHE schemes such as CKKS. While key‑decomposition switching offers better asymptotic complexity than traditional key‑switching, hardware implementations encounter four major obstacles:
Explosive computation when using short word lengths.
Significant increase in key storage size.
Dynamic word‑length requirements that conflict with fixed‑word hardware pipelines.
Lack of efficient hardware support for rounding operations, leading to precision loss.
To address these issues, the authors propose a fixed‑word key‑decomposition switching method that enforces a uniform word length across all stages, thereby aligning with fixed‑word datapaths and reducing both computational complexity and key storage overhead. The method includes:
Half‑RConv optimization: a partial radix‑conversion technique that halves the number of required convolution steps, cutting the arithmetic workload by roughly 50 % for typical parameter settings.
Hybrid on‑chip/off‑chip strategy: selective buffering of intermediate key components allows the accelerator to operate within limited on‑chip SRAM while still benefiting from the reduced word‑length regime.
Hardware‑friendly rounding: a deterministic rounding unit that eliminates rounding error without requiring expensive floating‑point units, preserving ciphertext correctness.
The resulting architecture is configurable: a control register selects between the traditional key‑switching flow and the fixed‑word decomposition flow, enabling designers to evaluate trade‑offs on the same silicon.
FHEFusion: Operator Fusion in FHE Compilers for Depth‑Efficient DNN Inference
Operator fusion reduces the multiplicative depth of homomorphic DNN inference, which directly lowers the noise budget consumption and runtime. Existing approaches either rely on manual kernel tuning—missing cross‑operator opportunities—or on generic pattern‑matching compilers that lack awareness of FHE‑specific constraints.
FHEFusion introduces a new intermediate representation (IR) for the CKKS scheme that augments the standard computational graph with FHE‑aware primitives: masking: applies a ciphertext‑level mask to zero out irrelevant slots, enabling reuse of ciphertexts across layers. compaction (implemented as Strided_Slice): extracts and packs a subset of slots, exposing additional fusion possibilities.
Guided by a set of algebraic rewrite rules (e.g., merging consecutive masking and compaction into a single operation) and an FHE‑aware cost model that estimates multiplicative depth and ciphertext size, the IR automatically identifies fusion patterns that minimize depth.
FHEFusion is integrated into the state‑of‑the‑art FHE compiler ANT‑ACE. Empirical evaluation on a CPU backend shows speedups ranging from 1.2× to 3.02× (average 1.40×) across seven DNN models derived from thirteen ReLU approximations, while maintaining inference accuracy.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
