Artificial Intelligence 18 min read

Vertical Domain Knowledge Graph Construction with OpenIE Techniques

This article explores the challenges of enterprise knowledge management and presents a comprehensive OpenIE-based approach for building vertical domain knowledge graphs, covering data extraction, SPO triple generation, case studies, and applications such as chatbots, semantic search, and intelligent QA.

DataFunSummit

Nov 29, 2020

Vertical Domain Knowledge Graph Construction with OpenIE Techniques

Guest: Du Zhendong @ Yunwen Technology

Editor: Su Wenyu

Platforms: DataFunTalk, AI Initiators

Introduction: Knowledge Graph (KG) was proposed by Google in 2012 as an efficient knowledge representation model. Compared with traditional information management, KG enables faster and more effective retrieval of logical relationships between pieces of knowledge, facilitating intelligent reasoning. Vertical‑domain KGs target specific industries and can be applied to search, intelligent QA, knowledge mining, and decision support, making their construction techniques highly significant.

1. Enterprise Knowledge Management Status

Many traditional enterprises still store massive paper documents, leading to severe historical data accumulation. Their ERP or proprietary knowledge‑management systems are tightly coupled, making upgrades difficult, and data silos hinder unified management.

From a knowledge‑management perspective, enterprises face fragmented knowledge, scattered management, chaotic exchange, fragmented learning, slow training, and difficulty in team improvement, all of which challenge efficiency.

Intelligent management of enterprise data is a pressing problem; knowledge graphs offer a technical solution.

2. Overview of Knowledge Extraction Methods

2.1 Knowledge Graph Service Process

The KG pipeline consists of three parts: knowledge extraction, graph generation, and graph consumption. Extraction transforms semi‑structured or unstructured data into a unified format. Generation builds a schema, resolves conflicts, and maintains the graph. Consumption drives the graph’s value through applications such as intelligent QA, knowledge search, association analysis, and decision support.

2.2 Knowledge Extraction

Enterprise KGs differ from open‑domain KGs; they rely on industry‑specific schemas, and the scale of entities and edges depends on data volume.

Various data sources require different extraction methods. Structured relational data can be converted to graph triples via D2R mapping. Semi‑structured data (e.g., contracts, tables) can be processed with wrapper‑like scripts similar to Python decorators, defining configurations, preprocessing, and regex transformations.

While wrappers work well for semi‑structured data, they are less effective for pure text extraction.

3. Text Knowledge Extraction Landscape

Two main paradigms exist: OpenIE (open‑domain) and CloseIE (closed‑domain). In practice, CloseIE is more common in industry because OpenIE precision often falls below 30% due to data heterogeneity and lack of large Chinese open‑domain datasets.

4. Terminology Discovery

High‑precision entity recognition is the first key step. New‑word discovery identifies candidate terms, but not all are useful entities. Combining NER models with ensemble techniques improves term coverage.

5. Closed‑Domain Information Extraction

Closed‑domain extraction relies on NER but can also use rule/template parsing for domain‑specific patterns.

6. Chinese Event Extraction

Event extraction benefits from defined schemas; when text variance is low, template‑based methods work, otherwise deep‑learning models are needed. For datasets under 1,000 instances, BERT may not outperform simpler models.

7. OpenIE‑Based SPO Extraction

7.1 SPO Definition

S (Subject) is the entity, P (Predicate) is the relation or attribute, and O (Object) is either a value (if P is an attribute) or another entity (if P is a relation). Accurate SPO triples can be directly inserted into the graph.

7.2 Baidu Triple Extraction Competition

The competition focused on pure‑text SPO extraction. Su Jianlin’s winning solution reformulated sequence labeling as a head‑tail span prediction. However, the dataset only defines 50 SPO types, limiting generalization to unseen types, which is a challenge for vertical‑domain OpenIE.

7.3 Close‑Domain Triple Extraction

In closed‑domain scenarios, SPO schemas can be predefined; sometimes the predicate does not appear in the source text and must be inferred using Baidu’s approach.

7.4 Open‑Domain Triple Extraction

Open‑domain extraction requires the predicate to appear in the text and often combines reading comprehension, entity recognition, and joint training to identify S‑P‑O triples for arbitrary documents.

8. Graph Application Cases

8.1 Chatbot

The KG powers a chatbot that routes user queries to specialized bots (task‑oriented, reading‑comprehension, graph‑QA). The system integrates KB‑QA and risk‑decision modules to deliver comprehensive answers without altering existing infrastructure.

8.2 Knowledge Search

KG‑based search goes beyond keyword matching, providing domain‑agnostic, graph‑structured results that are more intuitive and informative.

8.3 Intelligent QA

Yunwen’s AI architecture combines multiple bots (task, reading‑comprehension, graph‑QA) via a multi‑strategy router, delivering cognitive‑level answers and integrating risk‑decision modules for enterprise use cases.

9. Speaker Profile

Du Zhendong – Head of NLP Research Institute at Yunwen Technology, 8 years of ML and text‑mining experience, 6 years in Chinese NLP, proficient with PyTorch, TensorFlow, and responsible for large‑scale recommendation, multi‑turn dialogue, and knowledge‑graph projects. Co‑author of national AI standards and author of “Artificial Intelligence Practice” and “AI in Jiangsu”.

10. Book Recommendation

Du’s new book “Conversational AI: Natural Language Processing and Human‑Machine Interaction” is now available.

11. Knowledge Graph Forum Registration

December 19, 9:00‑12:00, hosted by senior Alibaba algorithm expert Zhang Wei, with guests from Baidu, Alibaba, Meituan, and Beike. Scan the QR code to register.

12. Article Recommendations

Empowering New Infrastructure: Building Next‑Generation Data‑Intelligent Infrastructure with Knowledge Graphs

Alink: A Flink‑Based Machine Learning Platform

Recommendation Technology Practice in 58.com’s Down‑Market

Short‑Video Analysis in Meituan Local Life

13. Community Invitation

Join the DataFunTalk Knowledge Graph community for peer交流; scan the QR code to add the assistant and enter the group.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

semantic search knowledge graph Enterprise AI Knowledge Extraction OpenIE SPO triples

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.