Privacy-Preserving Graph Learning and Recommendation: Techniques, Challenges, and Platform Overview
This article reviews the rapid development of privacy-preserving computation, explains its classification, discusses differential privacy, secure multi‑party computation, federated and split learning, and demonstrates how these techniques can be combined for graph learning and recommendation systems, culminating in a description of the JinZhiTa privacy‑computing platform.
In recent years, the rapid growth of the digital economy and the implementation of data security regulations such as China’s Data Security Law and Personal Information Protection Law have driven the flourishing of privacy‑computing technologies.
Background : Data silos hinder the full utilization of graph data (e.g., transaction graphs, social graphs, gene graphs). Merging graph data from multiple parties can provide a more complete view of entities, improving risk assessment and credit decisions.
Privacy‑Computing Technical Classification :
Cryptography (MPC, homomorphic encryption, private set intersection, zero‑knowledge proofs, etc.)
Anonymization (k‑anonymity, differential privacy)
Distributed Learning (federated learning, split learning)
Trusted Hardware (TEE, TPM)
Differential Privacy : Introduced by Dwork (2006), it adds calibrated noise (Laplace, Gaussian, or exponential mechanisms) to data, model parameters, or gradients to achieve ε‑DP, balancing privacy strength and utility.
Secure Multi‑Party Computation (MPC) : Allows parties to jointly compute functions without revealing raw inputs. Secret sharing splits data into random shares; the protocol ensures no intermediate information leaks.
Federated Learning : Clients keep data locally while a central server aggregates model updates, optionally protected by DP or MPC to prevent leakage.
Split Learning : The computation graph is partitioned; data‑related layers run on the client side, while the server handles the remaining non‑linear layers, reducing communication and preserving privacy.
Combining Techniques for Graph Learning :
Use split learning to keep the first few GNN layers on data owners, then apply MPC or secure aggregation for the deeper, non‑linear layers.
Combine MPC with split learning to achieve near‑original model accuracy while protecting intermediate representations.
Integrate federated learning with split learning to handle horizontally partitioned graph data across many clients.
Apply random permutation for linear sub‑computations to improve efficiency when full MPC is too costly.
Privacy‑Preserving Recommendation :
2C scenario : User data stays on the device; linear model components run locally, while higher‑order interactions are computed on the server using federated learning.
2B scenario : Cross‑domain recommendation uses differential privacy to release a low‑rank dense matrix from the source domain, then aligns it with the target domain for knowledge transfer.
JinZhiTa Privacy‑Computing Platform :
Developed since 2018 as part of a national big‑data credit assessment project, the platform integrates MPC, federated learning, and differential privacy to support data quality inspection, hierarchical data classification, and secure collaborative analytics across government and commercial data sources.
It enables containerized private nodes for each data owner, a scheduler to dispatch tasks, and supports linear models, tree models, neural networks, graph neural networks, clustering, and unsupervised methods.
Q&A Highlights :
Non‑linear operations in split learning can be offloaded to the server to avoid costly MPC on activation functions.
Privacy guarantees of differential privacy are proved by defining neighboring datasets and bounding the output distribution with ε (and optionally δ).
Unsupervised privacy‑preserving learning (e.g., K‑means via MPC) is still research‑level and rarely deployed.
Overall, the talk emphasizes a pragmatic, scenario‑driven approach to combine various privacy‑computing techniques to achieve a balance among security, efficiency, and utility.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.