Artificial Intelligence 13 min read

Vertical Federated Learning: Characteristics, Research Directions, and Performance Optimization

This article introduces federated learning, traces its evolution, compares horizontal and vertical federated learning, analyzes the unique computational traits of vertical FL, and presents practical performance‑optimization techniques such as offline computation, sparse‑data handling, communication compression, and homomorphic encryption integration.

DataFunSummit
DataFunSummit
DataFunSummit
Vertical Federated Learning: Characteristics, Research Directions, and Performance Optimization

Federated learning (FL) is a privacy‑preserving machine learning paradigm that allows multiple participants to jointly train a model while keeping their raw data locally.

The concept originated around 2016 with Google’s proposal, but its roots lie in earlier privacy‑preserving data mining, analysis, and machine learning research.

FL can be categorized into horizontal, vertical, and transfer learning. Horizontal FL focuses on parties with the same feature space, while vertical FL targets parties that share the same sample IDs but possess different feature dimensions.

Vertical FL is widely needed in industries such as telecom, finance, and advertising, where combining complementary feature sets can improve risk‑control or recommendation models.

Research on vertical FL is relatively scarce; challenges include algorithmic “losslessness,” lack of provable security, and significant communication and computation overhead due to extensive ciphertext operations.

Typical vertical FL algorithms include logistic regression and XGBoost. Both require heavy encrypted computations, frequent inter‑party communication, and large data transfer.

Performance‑optimization practices presented include:

Offline computation: pre‑compute expensive operations (e.g., Paillier exponentiation) offline to accelerate online training.

Sparse data computation: exploit sparsity in high‑dimensional data with optimized sparse matrix multiplication and histogram techniques.

Communication compression: pack multiple plaintexts or ciphertexts (e.g., Paillier packing) to reduce transmission volume.

Fully homomorphic encryption (FHE): leverage SIMD packing, ciphertext bundling, and quantum‑resistant properties to speed up encrypted calculations.

Multi‑technology fusion: combine MPC primitives with machine‑learning operators and integrate local plaintext computation with HE/SS techniques, aiming for “no‑third‑party” solutions.

These optimizations collectively address the three main computational characteristics of vertical FL—intensive ciphertext computation, extensive inter‑party communication, and large communication payloads—resulting in noticeable speed‑ups and reduced bandwidth usage.

The article concludes with a brief speaker bio and a note that the presentation is part of DataFunTalk’s ongoing series on privacy‑computing technologies.

Performance optimizationmachine learningFederated Learningprivacy computingVertical FL
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.