Artificial Intelligence 11 min read

OpenKG COVID‑19 Knowledge Graphs: Datasets, Schemas, and Applications

The OpenKG initiative, together with dozens of university and industry partners, has released a series of open‑source COVID‑19 knowledge graphs—including encyclopedia, research, clinical, hero, hotspot‑event, and upcoming prevention and resource graphs—detailing their data sources, scale, schema designs, and potential AI‑driven applications such as semantic search and intelligent question answering.

DataFunTalk
DataFunTalk
DataFunTalk
OpenKG COVID‑19 Knowledge Graphs: Datasets, Schemas, and Applications

In response to the COVID‑19 outbreak, OpenKG and experts from institutions such as Tongji University, Zhejiang University, Southeast University, Xiaomi AI Lab, Wuhan University of Science and Technology, and Fudan University collaboratively built multiple COVID‑19 knowledge graphs and released them under a CC‑BY‑SA license.

The first released graph, the COVID‑19 Encyclopedia KG (Version 1.0) , extracts entities like viruses, diseases, and bacteria from Baidu Baike, Hudong Baike, and Chinese Wikipedia, containing 2,617 instances and 14,411 triples from Baidu Baike, 1,626 instances and 10,980 triples from Hudong Baike, and 765 instances and 10,053 triples from Chinese Wikipedia. It supports semantic retrieval and intelligent Q&A for COVID‑19 terminology.

The COVID‑19 Research KG aggregates taxonomy data from NCBI, constructing a virus family tree with parent, species, genus, and family relations. It currently holds 205,494 instances, 1,634 concepts, and 1,934,206 triples, and aims to enable tasks such as virus classification, mutation prediction, host identification, and drug repurposing.

The COVID‑19 Clinical KG integrates diagnostic criteria, treatment protocols, and epidemiological statistics from official guidelines, Chinese medical encyclopedias, and TCM knowledge platforms. It provides a structured basis for clinical question answering and recommendation systems.

The COVID‑19 Hero KG records biographical information of medical experts, frontline heroes, and opinion leaders, linking them to entities in the other graphs. It contains 30 individuals (5 experts, 25 fallen heroes) with 20 concepts, 439 instances, 50 numeric attributes, and 463 object attributes.

The COVID‑19 Hotspot‑Event KG captures major pandemic events, their timestamps, sources, and summaries, enabling forward and backward temporal indexing and supporting provenance verification, potentially enhanced with blockchain for tamper‑proofing.

Each graph is accompanied by a detailed schema diagram (shown as images in the original source) and a data‑specification document that lists concepts, instances, numeric and object attributes, and triple counts. Future releases will add cross‑dataset linking, additional entity types (genes, proteins, hosts), and more comprehensive schemas.

All datasets are freely downloadable via the provided URLs (e.g., http://www.openkg.cn/dataset/2019-ncov-baike, http://www.openkg.cn/dataset/2019-ncov-research, etc.). The project acknowledges the coordination of contributors from multiple universities and companies, highlighting the collaborative effort behind the open knowledge graphs.

AIsemantic searchKnowledge Graphdatasetopen dataCOVID-19
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.