Privacy-Preserving Machine Learning: Balancing Data Utility and Confidentiality
Privacy-Preserving Machine Learning (PPML) integrates cryptographic techniques such as federated learning, differential privacy, homomorphic encryption, and secure multi-party computation to enable model training and inference on encrypted or distributed data, thereby breaking data silos while safeguarding privacy across sectors like healthcare, finance, and advertising.
Abstract
Privacy-Preserving Machine Learning (PPML) is a core paradigm that combines privacy computing and artificial intelligence. By leveraging federated learning, differential privacy, homomorphic encryption, and secure multi-party computation, PPML enables model training and inference on encrypted or distributed data, achieving a balance between data value extraction and privacy protection.
1. Fundamentals
1.1 Background
Recent rapid advances in AI, especially large language models based on the Transformer architecture, have impacted many domains such as medical diagnosis, financial risk control, and smart home assistants. Machine Learning as a Service (MLaaS) centralizes data for training and inference, creating significant privacy risks. PPML emerges to address these concerns.
1.2 What is Privacy-Preserving Machine Learning?
PPML aims to protect data privacy during both training and inference by using cryptography and distributed computing to build a bridge of trust among data providers, model trainers, and model users, achieving “data usable but not visible”.
PPML can be classified into three categories:
Data perturbation‑based PPML : Balances utility and privacy by injecting differential privacy or anonymization into data or model parameters.
Cryptography‑based PPML : Designs efficient secure computation protocols (e.g., homomorphic encryption, secure multi‑party computation) to achieve high security at the cost of higher computation and communication overhead.
Federated Learning : Enables multiple parties to jointly train a model without sharing raw data, exchanging only model updates.
2. Cryptography‑Based PPML Solutions
2.1 Technical Development
Advances in secret sharing, oblivious transfer, garbled circuits, and homomorphic encryption have provided the foundation for secure computation protocols that protect machine‑learning operators end‑to‑end. Over the past decade, research has moved from theory to practical deployments.
Key research directions focus on:
Efficiency optimization : Algorithmic improvements and hardware acceleration to reduce computational complexity and communication cost.
Functionality expansion : Extending support from secure inference to secure training and from lightweight networks to large Transformer models.
Representative research directions include:
Protocol design : Example – Sigma introduces efficient function secret sharing for Transformer non‑linear operators, achieving 11.5–19.4× speedup over generic frameworks.
Model optimization : Example – DELPHI and MPCFormer combine neural architecture search and knowledge distillation to lower model complexity while preserving utility.
System optimization : Example – NEXUS uses GPU acceleration for homomorphic‑encrypted Transformer inference, achieving a 42.3× speedup over CPU‑based solutions.
2.2 Generic Frameworks
Existing PPML solutions often require custom secure protocols for each operator, which is cumbersome for ML researchers. Recent frameworks abstract cryptographic details and provide APIs compatible with mainstream ML libraries.
1) TF Encrypted
TF Encrypted is an open‑source framework built on TensorFlow that allows users to write code similarly to standard TensorFlow while the underlying library handles secure computation via protocols such as server‑aided Pond, SecureNN, and ABY3.
import tensorflow as tf
import sys
sys.path.append('/path/to/tf-encrypted')
import tf_encrypted as tfe
@tfe.local_computation('input-provider')
def provide_input():
return tf.ones(shape=(2, 5))
w = tfe.define_private_variable(tf.ones(shape=(5, 3)))
x = provide_input()
y = tfe.matmul(x, w)
res = y.reveal().to_native()
@tfe.function
def matmul_func(x, w):
y = tfe.matmul(x, w)
return y.reveal().to_native()
res = matmul_func(x, w)Version requirements: python>=3.7 && Tensorflow=2.9.1 && g++>=14.0.0
2) Crypten
Crypten, developed by Facebook AI, builds on PyTorch and provides a familiar API for secure multi‑party computation. A simple PyTorch snippet can be transformed into a privacy‑preserving version by replacing tensors with crypten.cryptensor objects.
# PyTorch example
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
# Crypten version
x = crypten.cryptensor([1, 2, 3])
y = crypten.cryptensor([4, 5, 6])
z = x + yBoth frameworks hide the underlying MPC protocols, allowing users to focus on model design.
3) SecretFlow‑SPU
SecretFlow‑SPU further bridges the gap by supporting front‑ends such as Jax, TensorFlow, and PyTorch. Code is compiled into an intermediate representation (PPHLO) and then executed on hardware‑abstracted MPC protocols (ABY3, Cheetah, Semik, etc.), enabling secure computation of a wide range of tensor operations.
3. Applications
Medical Diagnosis : Federated learning enables cross‑institutional disease prediction models without sharing raw patient data.
Financial Risk Control : Credit scoring models trained via federated learning and evaluated on homomorphically encrypted data protect user financial information.
Advertising & Marketing : Joint training across e‑commerce and social platforms leverages differential privacy and federated learning to deliver personalized recommendations while complying with privacy regulations.
4. Conclusion
PPML, as the convergence of privacy computing and AI, is reshaping data‑driven industry paradigms. By employing federated learning, differential privacy, homomorphic encryption, and secure multi‑party computation, PPML achieves “data usable but not visible”, addressing the privacy paradox of traditional AI. Its adoption across healthcare, finance, education, and advertising demonstrates a transition from theory to large‑scale deployment, becoming a key enabler for compliant data circulation.
With tightening global privacy regulations (e.g., GDPR, China’s Personal Information Protection Law), PPML’s importance will grow. Future directions include lightweight algorithm optimization, hardware acceleration, and standardized protocols to further promote widespread adoption.
5. References
[1] Gupta, K., et al. “Sigma: Secure GPT inference with function secret sharing.” Cryptology ePrint Archive (2023).
[2] Knott B., Venkataraman S., Hannun A., et al. “Crypten: Secure multi‑party computation meets machine learning.” NeurIPS 2021.
[3] Srinivasan W. Z., et al. “Delphi: A cryptographic inference service for neural networks.” USENIX Security 2019.
[4] Li, D., et al. “Mpcformer: Fast, performant and private transformer inference with MPC.” arXiv 2022.
[5] Zhang J., Liu J., Yang X., et al. “Secure Transformer Inference Made Non‑interactive.” Cryptology ePrint Archive 2024.
[6] Ma J., Zheng Y., Feng J., et al. “SecretFlow‑SPU: A performant and user‑friendly framework for privacy‑preserving machine learning.” USENIX ATC 2023.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
