What Is a Knowledge Graph? From Basics to Embedding Techniques

This article introduces knowledge graphs, defining them as semantic networks or multi‑relational graphs, explains entities and relations, compares RDF and graph‑database storage, outlines construction steps including entity extraction and ontology building, reviews embedding models like TransE/H/R/D, and explores applications in search, finance, recommendation, and language models.

ByteDance SE Lab
ByteDance SE Lab
ByteDance SE Lab
What Is a Knowledge Graph? From Basics to Embedding Techniques

Introduction

Knowledge graph originally referred to Google’s semantic search knowledge base; academically it is a semantic network or multi‑relational graph that represents entities (nodes) and relations (edges) possibly with attributes.

Entities and Relations

In a knowledge graph, entities denote real‑world objects such as people, places, concepts, drugs, or companies, while relations describe the connections between entities. Both can carry attributes, e.g., a person node may have age and position, and a relation may have a start‑time attribute.

Storage

Knowledge graphs are stored mainly in two ways: RDF‑based stores, which use triple statements and emphasize easy publishing and sharing, and graph‑databases, which focus on efficient graph queries and allow attributes on nodes and edges. Graph‑databases typically achieve thousands‑fold faster multi‑hop queries and flexible schema evolution.

Construction

Building a knowledge graph starts with data extraction from two sources: structured business data (databases) and unstructured data (web text, open documents). Structured data often requires minimal preprocessing, while unstructured data needs natural‑language processing for entity extraction and relation extraction.

Named Entity Recognition & Relation Extraction

NER identifies and classifies entities in text (e.g., "NYC" as Location). Relation extraction then links entities with semantic relations (e.g., "hotel" – "near" – "Times Square").

Entity Unification & Coreference Resolution

Entity unification merges different surface forms of the same entity (e.g., "NYC" and "New York"), reducing sparsity. Coreference resolution determines the referent of pronouns such as "it" or "he".

Knowledge Computation

Ontology Construction – compute entity similarity, extract hierarchical relations, and generate a concept schema.

Knowledge Reasoning – apply logical, graph‑based, or deep‑learning inference to fill missing links.

Quality Assessment – quantify confidence and discard low‑confidence facts, often with human verification.

Knowledge Update – continuously incorporate new data, updating both concept and instance layers.

Embedding Techniques

Embedding transforms entities and relations into continuous vectors for downstream tasks. Major models include:

TransE – enforces head + relation ≈ tail; suitable for one‑to‑one relations.

TransH – projects entities onto relation‑specific hyperplanes before translation, handling many‑to‑many links.

TransR – maps entities into relation‑specific spaces via projection matrices.

TransD – assigns each entity and relation a pair of vectors to dynamically generate projection matrices, reducing parameter explosion.

Applications

Knowledge graphs enhance semantic search by expanding queries with synonyms and hierarchical terms, enable answer‑retrieval systems, and support relation‑search interfaces.

In finance, they provide structured multi‑source data for services such as visual analytics, risk assessment, fraud detection, and event impact analysis.

For recommendation, graph‑based similarity, rule‑based paths, and embedding‑driven methods (e.g., DKN) improve accuracy and diversity.

Language models like ERNIE integrate external KG facts during pre‑training, using masked entity prediction to fuse textual and knowledge representations, thereby improving tasks such as entity typing and relation classification.

AIInformation RetrievalKnowledge Graphgraph embeddingsemantic network
ByteDance SE Lab
Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.