Artificial Intelligence 21 min read

Building Real‑World Medical Knowledge Graphs and Clinical Event Graphs: Methods, Pipelines, and Applications

This article explains how YiduCore processes heterogeneous hospital data (EMR, HIS, LIS, RIS, literature) to construct real‑world medical knowledge graphs and clinical event graphs, detailing pipelines for entity extraction, normalization, graph cleaning, PSR scoring, graph embedding, and showcasing applications such as intelligent diagnosis, question answering, automated medical record generation, and clinical trial patient recruitment.

DataFunTalk
DataFunTalk
DataFunTalk
Building Real‑World Medical Knowledge Graphs and Clinical Event Graphs: Methods, Pipelines, and Applications

YiduCloud, founded in 2014, provides a data‑intelligent infrastructure called YiduCore that aggregates and processes large‑scale, multi‑source, heterogeneous medical data from EMR, HIS, LIS, RIS, literature, guidelines, and drug labels to build real‑world disease models.

Key medical data concepts include:

EMR – electronic medical records containing narrative clinical notes.

HIS – structured information about examinations, prescriptions, and orders.

LIS – laboratory test results.

PACS/RIS – imaging and imaging report management.

The knowledge‑graph construction pipeline first merges patient‑level and visit‑level data across systems, performs de‑identification, and conducts data quality control. Entities are extracted using dictionary‑based and LSTM‑CRF NER models, then standardized via regex and medical dictionaries.

Standardized entities are linked to form a graph, followed by property calculation, graph cleaning, ranking, and optional graph embedding. To improve the usefulness of triples, a fourth‑order attribute (probability, specificity, reliability) is added, forming a PSR metric that ranks entity relationships.

Graph embedding learns vector representations for subjects, predicates, and objects, optimizing a loss that aligns summed subject‑predicate vectors with object vectors while preserving conditional probabilities.

Applications demonstrated include:

Intelligent diagnosis: Bayesian inference over the disease‑probability graph suggests likely diseases and recommends relevant tests.

Information‑retrieval ranking: PSR scores rank medications for a given diagnosis, reducing physician search time.

Intelligent Q&A: Multi‑turn symptom interrogation mimics doctor questioning to suggest diseases and referral departments.

Neural‑network integration: Adding a graph‑embedding layer to a Bi‑LSTM model improves convergence and accuracy for next‑step medication prediction.

Clinical event graph: Captures generic and specialty‑specific events (e.g., tumor surgery, chemotherapy) to enable timeline visualisation, automated medical‑record generation (NLG), and event‑based patient recruitment for trials.

Event extraction relies on structured text extraction and medical‑knowledge‑driven logical reasoning, with specific focus on chemotherapy regimen identification, purpose (neoadjuvant vs. adjuvant), and efficacy evaluation (CR, PR, SD, PD).

Future work addresses challenges in Chinese medical NLP such as lack of standardized terminology, limited high‑quality annotated data, and model interpretability, pursued through collaborations with academia, standards bodies, and open competitions.

The talk concludes with a Q&A session covering model choices, data reliability, schema standards, and the current state of medical knowledge‑graph research.

Big DataAINLPgraph embeddingclinical event graphMedical Knowledge Graph
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.