Ant Group's Knowledge Graph: Overview, Construction, Applications, and Integration with Large Models
This article presents Ant Group's work on knowledge graphs, covering the fundamentals, construction pipeline, fusion techniques, cognitive modeling, real‑world applications, and the emerging synergy between knowledge graphs and large language models, while highlighting technical challenges and future directions.
Overview
Knowledge graphs model complex relationships between entities using graph structures, enabling semantic understanding for search, QA, and large‑scale data analysis. Ant Group leverages them for cognitive intelligence across various business scenarios.
1. What is a Knowledge Graph
It captures both semantic and structural relations, integrating deep learning to enhance representation.
2. Why Build a Knowledge Graph
Standardize heterogeneous data sources.
Accumulate domain knowledge.
Enable knowledge reuse for downstream services.
Support knowledge reasoning for risk control, credit, claims, marketing, etc.
3. Knowledge Graph Construction Overview
The construction paradigm consists of five parts:
Business data as the cold‑start source.
Fusion with external graphs via entity alignment.
Integration of structured domain knowledge bases.
Information extraction from unstructured/semi‑structured data.
Incorporation of domain concepts and expert rules.
From an algorithmic view, capabilities include knowledge inference and matching; from a deployment view, the stack spans graph engines, NLP & multimodal platforms, graph construction tools, inference modules, generic algorithm services, and business applications.
4. Graph Construction Details
The six‑step pipeline:
Data source acquisition.
Knowledge modeling (concepts, entities, events).
Knowledge acquisition via a processing platform.
Storage (HA3 and graph stores).
Knowledge operation (editing, online query, extraction).
Continuous learning for model iteration.
Key techniques:
Entity classification with expert knowledge: semantic label embeddings, contrastive learning, logical rule constraints.
Domain vocabulary injection for entity recognition using boundary and semantic contrastive learning.
Few‑shot relation extraction with logical rule‑based reasoning and fine‑grained difference perception.
5. Graph Fusion
Fusion merges graphs from different business domains, enabling cross‑business knowledge reuse, reducing data duplication, and accelerating value delivery.
Entity alignment is performed with the SOTA BERT‑INT model, comprising a representation module and an interaction module, followed by recall (title similarity) and ranking (title, attributes, neighbors).
6. Graph Cognition
Ant Group adopts an encoder‑decoder framework where encoders are graph neural networks and decoders perform tasks such as link prediction, producing low‑dimensional embeddings that are storage‑efficient, dense, and suitable for multi‑source data fusion.
7. Graph Applications
Typical use cases include:
Structured matching recall for Alipay mini‑program search.
Real‑time user intent prediction in recommendation systems (AlipayKG).
Dynamic graph‑based coupon recommendation with temporal modeling.
Insurance claim expert rule reasoning using medical knowledge graphs.
8. Knowledge Graphs and Large Models
Large models provide general knowledge and flexibility, while knowledge graphs offer accuracy and interpretability. Three integration routes are discussed:
Enhancing large models with graph knowledge.
Using large models to improve graph construction (information extraction, modeling, reasoning).
Co‑training and co‑inference where graphs supply priors and constraints to mitigate hallucination and improve timeliness.
Applications include knowledge‑enhanced QA systems and retrieval‑augmented generation.
9. Summary and Outlook
Future directions focus on deeper NLP and QA integration, leveraging graphs for large‑model hallucination detection and detoxification, and developing domain‑specific large models combined with graph knowledge.
Overall, Ant Group's knowledge graph ecosystem demonstrates a comprehensive pipeline from data ingestion to AI‑driven business impact, and outlines promising collaborations with emerging large language models.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.