Artificial Intelligence 14 min read

UCL Team Uses Federated Learning to Train Blood Morphology Models Without Sharing Data

A UCL computer‑science team presents a federated learning framework for white‑blood‑cell morphology analysis that preserves patient privacy, leverages heterogeneous clinical slide data from multiple sites, and achieves superior cross‑site performance and generalisation to unseen institutions compared with centralized training.

HyperAI Super Neural

Feb 13, 2026

UCL Team Uses Federated Learning to Train Blood Morphology Models Without Sharing Data

Blood‑cell morphology examination is essential for diagnosing hematologic diseases but is labor‑intensive and relies on expert pathologists, a scarcity in low‑ and middle‑income countries. Deep‑learning models can automate cell classification, yet they require large, diverse datasets that are often siloed across hospitals, with variations in staining, imaging equipment, and rare cell types causing poor generalisation.

Federated Learning Framework for White‑Blood‑Cell Analysis

Researchers from University College London (UCL) propose a federated learning (FL) framework that enables collaborative model training without exchanging raw images. Multiple clinical sites contribute blood‑smear slides; local clients train models on their data and only share parameter updates with a central server, which aggregates them into a global model. This preserves complete data privacy while learning robust, domain‑invariant feature representations.

Dataset and Clinical Heterogeneity

The study uses two independent datasets from different hospitals, each containing 11 shared cell types (e.g., neutrophils, eosinophils, basophils, promyelocytes). The datasets retain staining and imaging differences to test FL under realistic heterogeneity. An external validation set from a Barcelona hospital (12,992 images) with distinct equipment and patient demographics is held out for evaluating unseen‑institution performance.

Model Architectures and Training Protocol

Two deep‑learning backbones are evaluated:

ResNet‑34 : a classic CNN initialized with ImageNet weights.

DINOv2‑Small : a self‑supervised Vision Transformer that captures global image features.

Both architectures employ selective fine‑tuning (freezing early layers, training the last three residual blocks for ResNet‑34 and blocks 8‑11 for DINOv2‑Small). Training follows a unified FL protocol: 5 global communication rounds, each with 5 local epochs (total 25 epochs). The centralized baseline uses 25 epochs with 4‑fold cross‑validation. Images are resized to 224×224 px and augmented with modest translations (±10 %) and rotations (±5°) to retain morphological detail.

Federated Aggregation Strategies

Four FL aggregation methods are compared:

FedAvg : weighted average of client parameters (sensitive to extreme class distributions).

FedMedian : coordinate‑wise median (robust to outliers but may suppress minority signals).

FedProx : adds a proximal term to the local loss for stable convergence on non‑IID data.

FedOpt : uses adaptive Adam‑style optimization on aggregated gradients, dynamically adjusting learning rates for heterogeneous clients.

Class imbalance is mitigated with focal loss, weighted random sampling, gradient accumulation, and gradient clipping (max norm = 1.0).

Evaluation Metrics

Performance is measured by balanced accuracy, emphasizing cross‑institution generalisation. Additional per‑class F1 scores assess rare‑cell detection.

Joint Test‑Set Results

When evaluated on the combined client data, the four aggregation methods show distinct behaviours. FedOpt exhibits high variance (ResNet‑34 balanced accuracy = 0.3638, DINOv2‑Small = 0.5594). FedAvg and FedProx remain stable across both backbones. FedMedian achieves the most consistent performance (ResNet‑34 = 0.5738, DINOv2‑Small = 0.5797). Overall, FL improves balanced accuracy from 52 % (single‑site training) to 58 % compared with centralized training, while preserving privacy.

External Validation on Unseen Institution

On the Barcelona validation set, FedMedian and FedOpt surpass centralized training (balanced accuracy ≈ 67 % vs 64 %). Exposure to heterogeneous client data during FL helps the model learn more generalisable morphological features.

FedMedian notably boosts rare‑cell F1 scores: band neutrophils 0.62 vs 0.30 (↑107 %), promyelocytes 0.61 vs 0.35 (↑74 %). However, metamyelocytes remain challenging (F1 = 0.02‑0.30) across all methods, reflecting the difficulty of learning robust representations for extremely rare classes.

Architecture‑Aggregation Interaction Patterns

Key observations:

FedMedian provides cross‑architecture stability but can hinder rare‑class performance.

FedOpt preserves minority‑class signals yet is sensitive to the underlying architecture.

DINOv2‑Small’s transformer backbone shows higher robustness to non‑IID distributions than ResNet‑34, which is more affected by gradient conflicts.

Conclusions

The study demonstrates that federated learning can deliver privacy‑preserving, high‑performing blood‑morphology analysis suitable for resource‑limited medical settings. While FL models may slightly lag centralized models trained on pooled data, they achieve comparable accuracy without compromising patient confidentiality, and they generalise better to new institutions.

Broader Implications

Federated learning is positioned as a key technology for breaking medical “data islands,” enabling scalable, compliant AI deployment across hospitals. By supporting distributed, privacy‑aware training, FL lays the groundwork for large‑scale medical AI models, cross‑institution clinical decision support, and collaborative research platforms.