Federated Learning Technology Application Innovation Exploration
This presentation reviews the rapid rise of privacy‑preserving computation and federated learning since 2018, explains the fundamentals and classifications of federated learning, and details five technical innovations implemented by China Telecom—including a standard architecture, data‑pollution detection, anti‑member‑inference inference, asynchronous optimization, and contribution‑value assessment—demonstrating practical AI solutions for large‑scale data security and privacy.
Since 2018, concepts such as privacy‑preserving computation, federated learning, and secure multi‑party computation have appeared increasingly often in both industry and academia, with 2020 called the "year of privacy computing" and 2021 the year of its commercial rollout.
In this talk, Dr. Zhou Xuhua, Deputy Director of the Security Technology Research Institute at China Telecom Research Institute, shares practical problems and solutions he encountered while applying federated learning in the telecom sector.
The agenda covers two main parts: (1) an introduction to federated learning; (2) technical innovation explorations.
Federated learning (or federated machine learning) enables multiple parties to jointly train and predict machine‑learning models without moving raw data off‑platform, protecting data security, personal privacy, and regulatory compliance while unlocking the value of large‑scale data collaboration.
Compared with secure multi‑party computation, federated learning is easier for decision makers to accept because it only transmits intermediate model parameters, not raw data.
Federated learning can be classified into three types based on data characteristics: horizontal federated learning (similar feature space, different samples), vertical federated learning (different feature space, overlapping samples), and federated transfer learning (both feature and sample overlap are low).
Horizontal federated learning is suited for scenarios where enterprises share similar features but have distinct user bases; vertical federated learning is used when enterprises have complementary features for the same users; federated transfer learning is applied when both feature and sample overlap are minimal.
Compared with other privacy‑preserving techniques, federated learning (1) adds secure intermediate‑parameter exchange to address data‑leakage risks of centralized training, (2) follows the "data stays on platform, model moves" principle, gaining decision‑maker trust, (3) can achieve model performance comparable to centralized learning, especially in vertical federated settings, and (4) can be combined with differential privacy, homomorphic encryption, and other techniques.
Innovation Exploration 1: Flexible Standard and Transaction‑Center Architectures
China Telecom built a standard federated‑learning architecture that allows the telecom side to keep all private data locally while providing a unified interface for external partners to submit federated‑learning tasks, supporting billions of data records and high‑latency, low‑bandwidth network conditions.
The transaction‑center architecture separates management modules from computation modules, enabling unified control and data‑local processing, and provides a single entry point for partners.
Innovation Exploration 2: Data‑Pollution Detection for Vertical Federated Learning
Because participants cannot directly inspect each other's data, polluted or malicious data may go unnoticed. The proposed solution extracts statistical features from clean training data, builds probability distributions for each feature, and sets thresholds to evaluate the likelihood that new data is valid.
This mechanism can be applied to (1) batch‑wise detection within a single training round, (2) cross‑partner detection in horizontal federated learning, and (3) preventing malicious parties from stealing data or models during training.
Innovation Exploration 3: Federated Linear Model Online Inference Resistant to Membership Inference Attacks
Standard federated inference may expose user identifiers and allow inference attacks. The solution combines filtering, homomorphic encryption, and random‑mask multiplication to hide the exact user request from the responder while still enabling accurate predictions.
Innovation Exploration 4: Asynchronous Optimization for Vertical Federated Learning
To avoid the "slowest‑node" bottleneck, an asynchronous auxiliary component caches intermediate feature values, allowing each participant to continue training locally while receiving the next‑step model parameters from the auxiliary, thus improving training efficiency without sacrificing model quality.
Innovation Exploration 5: Quantifying Participant Contribution Using Shapley Value
To incentivize high‑quality data providers, a contribution‑assessment method based on Shapley value and Monte‑Carlo simulation evaluates each party’s feature impact on the global model, enabling fair reward distribution while keeping computational cost manageable.
The speaker, Dr. Zhou Xuhua, holds a Ph.D. from Shanghai Jiao‑Tong University and leads privacy‑preserving computation research at China Telecom, responsible for the "Hongshu‑Kaiyang" privacy‑computing system and its industry deployments.
Thank you for listening; the session concludes with a reminder to scan the QR code for the Data Security and Privacy Computing Summit replay.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.