Artificial Intelligence 13 min read

Fundamentals and Practical Implementation of Knowledge Graphs and Attribute Extraction

The article surveys the evolution and core components of knowledge graphs—from early Linked Data concepts to modern semantic networks—detailing the end‑to‑end pipeline of data acquisition, cleaning, extraction, and fusion, and showcases Tencent Cloud’s Merak framework and encyclopedia KG, highlighting model choices, performance benchmarks, and real‑world applications such as recommendation and intelligent Q&A.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Fundamentals and Practical Implementation of Knowledge Graphs and Attribute Extraction

The concept of knowledge graphs was first introduced by Google in 2012, using semantic retrieval to collect and process information from multilingual data sources such as FreeBase and Wikipedia. Earlier, Tim Berners‑Lee proposed Linked Data in 2006, and research on Semantic Link Networks dates back to 2004.

Building a knowledge graph typically requires a technology stack that includes data acquisition, data cleaning, knowledge extraction, knowledge fusion, and graph storage. The article outlines this workflow with a schematic diagram.

At its core, a knowledge graph is a semantic network composed of nodes (entities or concepts) and edges (relationships). Entities represent real‑world objects, while edges capture the relations between them (e.g., the singer Liu Dehua and his attributes such as birthdate, spouse, height, and movies).

Knowledge graphs are widely applied in personalized recommendation, address resolution, search engines, intelligent Q&A, and education. Tencent Cloud’s knowledge‑graph team has deployed the technology in short‑video recommendation, intelligent Q&A, and other scenarios, and provides a mini‑program for visualization and query.

From 0 to 1: Mastering Attribute Extraction

Attribute extraction is part of knowledge extraction, which converts unstructured data into a format suitable for graph databases. It includes entity extraction, relation extraction, attribute extraction, and concept extraction. The former three can be modeled as sequence‑labeling tasks, while relation extraction is a classification task.

Tencent Cloud’s Merak (天璇) knowledge‑extraction framework offers a one‑stop solution for these tasks. It supports multiple models such as BERT, Bi‑LSTM+CRF, and Attention‑CNN, and provides advantages:

One‑stop algorithm solution with configurable pipelines for data processing, model training, and deployment.

Abstracted model layers for easier understanding, assembly, and extensibility.

Support for mainstream models (BERT, Bi‑LSTM+CRF, Attention‑CNN, etc.).

CPU/GPU multi‑card distributed training and high‑quality Chinese BERT pretrained models.

Experimental results show Merak achieves industry‑leading performance on relation and attribute extraction tasks, with low training time and high prediction accuracy.

The article discusses trade‑offs in knowledge‑graph construction: overly coarse graphs may lack utility for fine‑grained tasks (e.g., QA), while overly detailed graphs increase cost and noise.

For attribute extraction, the article uses a person‑attribute example (gender, education, birthplace, birthdate, hometown, alma mater). It explains why attribute extraction can be treated as a sequence‑labeling problem and reviews the evolution of sequence labeling from HMM/CRF to deep neural networks and Transformer‑based models.

BERT fine‑tuning yields higher accuracy than Bi‑LSTM+CRF, but requires more parameters (≈300M) and computational resources. Bi‑LSTM+CRF is an end‑to‑end architecture with fewer resources.

The training pipeline includes downloading a pretrained Chinese BERT model, placing training samples in ./../people_attribute_extraction , and following the quick‑start steps illustrated in the article.

Results on the person‑attribute dataset show the BERT + fully‑connected approach achieving an F1 score of ~0.985.

An industry news flash mentions NVIDIA training a massive Transformer model with 83 billion parameters using 512 V100 GPUs, surpassing publicly disclosed Google models.

Finally, the article introduces Tencent Cloud Encyclopedia Knowledge Graph, a large‑scale general‑purpose KG covering 51 domains, 221 types, 4 320 attributes, over 97 million entities and 1 billion triples. It provides APIs for entity, relation, and triple queries (using TQL), a free‑tier access, and SDK integration examples.

The article concludes with a summary of the knowledge‑graph industry status, key technical points of knowledge extraction, and applications of Tencent Cloud’s encyclopedia KG, followed by references to seminal papers on Transformer, BERT, XLNet, and related models.

AIKnowledge GraphBERTattribute extractionknowledge extractionMerak frameworksemantic network
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.