How Delphi-2M Uses Generative Transformers to Predict Over 1,000 Diseases

A new AI system called Delphi-2M, built on an enhanced generative‑transformer architecture and trained with UK Biobank data, can forecast the risk of more than a thousand diseases up to twenty years in advance, achieving an average AUC of 0.67 in external validation while offering explainable, extensible predictions for personalized health.

Data Party THU
Data Party THU
Data Party THU
How Delphi-2M Uses Generative Transformers to Predict Over 1,000 Diseases

Overview

Delphi‑2M is a generative‑transformer model that predicts a person’s risk of developing more than 1,000 distinct diseases. The model extends the GPT architecture to capture temporal dependencies among past health events and to fuse heterogeneous prognostic data such as electronic health records, lifestyle questionnaires, prescription histories, and laboratory measurements.

Delphi-2M overview
Delphi-2M overview

Training Data

The model was trained on 400,000 participants from the UK Biobank, using longitudinal health records that span multiple decades. Training leveraged the International Classification of Diseases, 10th Revision (ICD‑10) taxonomy to map over 1,000 top‑level disease categories.

External Validation

To assess generalisation, the pretrained weights were applied directly to massive external cohorts without any fine‑tuning or parameter adjustment. In these out‑of‑sample evaluations the model achieved an average area‑under‑the‑curve (AUC) of 0.67 (standard deviation 0.09), compared with an internal longitudinal test AUC of 0.69 (SD 0.09). Despite the modest drop, performance matches that of disease‑specific models while delivering simultaneous risk estimates for all 1,000+ conditions.

Explainability analysis
Explainability analysis

Risk Projection

Delphi‑2M can generate 20‑year risk trajectories for each disease by integrating the aforementioned data modalities. The transformer backbone makes it straightforward to add new data layers (e.g., additional lifestyle factors, self‑reported health status, prescription records, biomarker panels), enabling richer, more personalised forecasts.

Explainability

From an explainable‑AI perspective, the authors analysed feature importance for individual disease predictions, highlighting which variables most strongly drive risk estimates. This analysis demonstrates that the model’s decisions are grounded in clinically interpretable signals.

Limitations

Model performance varies with the diversity and quality of input health‑data sources.

AI‑driven risk scores should complement, not replace, established diagnostic pathways.

Key References

Full technical details are available in the paper “Learning the natural history of human disease with generative transformers” published in Nature . URL: https://www.nature.com/articles/s41586-025-09529-3

Code example

来源:ScienceAI
本文
约1200字
,建议阅读
5
分钟
现在,人工智能(AI)可以为人类预测疾病风险了!
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIRisk ModelingDelphi-2MDisease Predictiongenerative transformerhealth informaticsPrecision Medicine
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.