From Computer Vision to Medical AI: Prof. Xie's Work Hits Nature, NeurIPS, CVPR

Professor Xie's team at Shanghai Jiao Tong University reports rapid progress in AI for Science, detailing multimodal medical AI models, large open datasets, language and vision‑language models, and knowledge‑enhanced representations that outperform existing baselines across multiple benchmarks.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
From Computer Vision to Medical AI: Prof. Xie's Work Hits Nature, NeurIPS, CVPR

Medical AI as a Growing Trend

AI for Science has accelerated, bringing new research ideas and challenging applications to fields such as medicine. Large models like ChatGPT have been evaluated on the United States Medical Licensing Examination, rising from a 50‑point score before 2022 to 90 points with GPT‑4, prompting many medical schools to launch new "Intelligent Medicine" programs.

Team Goal: Build a Generalist Medical AI System

Since returning to China in 2022, the team aims to create a multimodal generalist model for medicine. The model accepts diverse inputs—images, audio, patient records—and produces visual outputs (segmentation, detection) and textual outputs (diagnoses, reports) to assist clinicians.

Open High‑Quality Medical Datasets

The team has constructed extensive open resources. Textual data include over 30,000 medical books (≈40 billion tokens), 4.8 million PubMed Central papers (≈750 billion tokens), and multilingual books in eight languages. They also created 124 “Super Instructions” covering 1.35 million samples for 1,350 medical tasks.

For vision‑language data, they crawled ~200 k cases from Radiopaedia, collected images and captions from papers, and gathered >30 k volumes from radiology reports. They unified ~120 public radiology datasets into a single standard, yielding >35 k 2D/3D scans (MR, CT, PET, US) with 400 k fine‑grained organ annotations covering 500 organs, all released openly.

Iterative Development of Professional Medical LLMs

Language Model

In April 2023 they released PMC‑LLaMA, an open‑source medical large language model trained on the assembled medical and paper data, followed by instruction fine‑tuning. Yale researchers cite PMC‑LLaMA as the earliest open medical LLM and use it as a baseline, though the team acknowledges a performance gap to closed‑source models.

Subsequently they published a multilingual medical LLM trained on 25 billion medical tokens in six languages (English, Chinese, Japanese, French, Russian, Spanish) and introduced a new benchmark for evaluation.

Visual‑Language Models

Using the curated datasets, they trained three vision‑language models: PMC‑CLIP, MedVInT, and RadFM. PMC‑CLIP earned the Young Scientist Publication Impact Award at MICCAI 2023. RadFM, a radiology foundation model, ingests interleaved image‑text pairs and can directly answer questions about radiology images.

Knowledge‑Enhanced Representation Learning

The team sources general medical knowledge from the web and UMLS, and domain‑specific knowledge from case reports, radiology images, and anatomy resources, constructing a knowledge graph and a knowledge tree focused on cancer diagnosis. Injecting this structured knowledge into models yields performance far above baselines from Microsoft and Stanford.

For pathology, their paper "Knowledge‑enhanced Visual‑Language Pretraining for Computational Pathology" was accepted as an oral presentation at ECCV 2024. For radiology, a model described in "Large‑scale long‑tailed disease diagnosis on radiology images" (Nature Communications) can output disease predictions directly from imaging data.

Overall Achievements

The workflow comprises (1) building the largest open radiology image dataset (≈200 k images, 41 k patients, 930 diseases), (2) creating multimodal, multilingual models enriched with domain knowledge, and (3) releasing corresponding benchmarks, thereby advancing open medical AI research.

large language modelsVision-LanguageMedical AIKnowledge Graphsmultimodal modelsOpen Datasets
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.