Artificial Intelligence 19 min read

How Cognitive Concept Graphs Power Modern Search Understanding

This article explains the motivation, challenges, architecture, and algorithms behind building a large‑scale cognitive concept graph for search, detailing data construction, concept mining, fusion, confidence scoring, hierarchical structuring, validation, service algorithms, platform access, and real‑world applications such as intent recognition and entity recommendation.

Alibaba Cloud Developer

Sep 11, 2019

How Cognitive Concept Graphs Power Modern Search Understanding

Background

Concepts are the fundamental units of human cognition, representing abstract reflections of objective things and serving as the building blocks of thought. Constructing a classification system for billions of entities and linking them in a cognitive concept graph is a crucial step toward endowing machines with cognitive abilities.

What Is a Cognitive Concept?

Since Aristotle, humans have organized concepts using taxonomies. Modern knowledge bases like Cyc, WordNet, and HowNet provide high‑quality but limited‑scale concept hierarchies. In search, a cognitive concept refers to the abstract description represented by a user‑mentioned phrase or entity.

Challenges

Massive redundancy: a single instance may have hundreds of concept tags (e.g., “song” vs. “track”).

Varying confidence levels for different tags (e.g., “company” vs. “role”).

Difficulty mining long‑tail domain terms and entities.

Need to extract concepts from non‑entity phrases such as symptoms.

Building hierarchical relationships for millions of concepts.

Filtering out noisy concepts like “hope” or “ideal”.

System Overview

Leveraging the Shenma Search Knowledge Graph and its entity repository, we built a cognitive concept graph that connects user search intent with external commonsense and domain knowledge, providing unified data for search, recommendation, and knowledge‑driven intelligent services.

The graph contains rich concept instances (both entities like "Liu Dehua" and non‑entities like "we"), multi‑granularity concepts (e.g., "actor", "pink‑boy in the entertainment circle"), and hierarchical relations (isA).

Levels:

Level 1: Domain nodes (e.g., “medical”, “music”).

Level 2: Specific cognitive concepts (e.g., “actor”).

Level 3: Fine‑grained user‑oriented concepts (e.g., “pink‑boy in the entertainment circle”).

Instance layer: mentions of concepts.

Features and Advantages

Dynamic: each concept instance carries weighted candidates generated from query distribution (e.g., "Zhou Jielun" → artist 0.58, singer 0.26, actor 0.13).

Highly automated: daily updates of high‑level concepts and evaluation.

Fine granularity: most instances include detailed user‑level concepts.

Broad coverage across domains such as people, medicine, history, automotive, music.

Algorithm Framework

The framework consists of data construction and algorithm services.

Data Construction Process

Steps include concept mining & fusion, confidence calculation, hierarchy building, and concept validation. After importing data, the graph can infer new hierarchical relations and feed back into the knowledge base, forming a closed data loop.

Domain Concept Mining

Entity attributes, encyclopedia tags, and rule‑based extraction provide concept tags for regular entities. For long‑tail domain entities, we train a skip‑gram model on domain texts, select seed words, cluster based on transitive similarity, and filter clusters with rules, extracting over 10 k domain terms.

Phrase Concept Mining

Search queries contain many phrases lacking corresponding entities. We perform unsupervised phrase mining, then train a classifier to label concepts. Frequent‑pattern mining (TopMine) and topic modeling are used to segment text and merge tokens based on contextual scores.

Concept Fusion

Redundant tags from different sources are merged: level 2 concepts use synonym dictionaries; level 3 concepts use character‑ and word‑level embeddings with a similarity threshold (1e‑3) followed by manual review.

Concept Confidence

Entity‑level confidence is derived from popularity (e.g., BM25 scores) and normalized. Query‑level confidence aggregates query‑tag frequencies over a month and normalizes. The two confidences are fused, with stop‑words and noisy concepts filtered, and domain‑specific concepts re‑weighted.

Hierarchy Construction

Concept instances and candidates are attached to hierarchies using two methods: (1) mapping tables built from Shenma information flow topics and Tencent concept graph topics for level 1–2; (2) query classification and rules for level 3. Instances with a similarity score > 0.3 are linked.

Concept Validation

Noise is filtered using rule‑based and GBDT models that consider instance length, confidence, part‑of‑speech, etc., removing about 500 k non‑concept instances. Ongoing validation addresses ambiguous or heterogeneous concepts.

Service Algorithms

Dictionary Matching & NER Boundary Tagging

We build dictionaries from graph instances and level 2/3 concepts, apply Trie‑tree + bidirectional maximum matching, and refine boundaries with an NER service.

BERT‑Based Entity Typing Disambiguation

A BERT model trained on ~1 billion search logs (without next‑sentence loss) provides embeddings for OOV words; a downstream MLP classifies entity mentions into disambiguated concepts.

Platform Access

Data can be accessed via a web UI ( https://concept.proxy.taobao.org ), API, ODPS tables, or Pangu data dumps.

Applications

Intent Recognition

By mapping query concepts (e.g., "Song Jiang" → "unpopular Liangshan hero", "fictional character"), the system accurately infers user intent, improving retrieval, ranking, and recommendation.

Entity Recommendation

When users search for multiple universities in Shanghai, the graph discovers the shared concept "Shanghai higher education institutions" and recommends related schools such as Shanghai International Studies University and Tongji University.

Future Plans and Reflections

Knowledge empowerment: apply the cognitive concept graph to more business scenarios.

Deeper user understanding: extract finer‑grained user needs from queries.

Content comprehension: link content with concepts for better text understanding.

Knowledge expansion: enrich the graph with more concepts.

Intelligent reasoning: automate large‑scale hierarchical construction.

References

认知智能基础 – https://www.atatech.org/articles/136437

A User‑Centered Concept Mining System for Query and Document Understanding at Tencent – KDD201

Concept‑based Short Text Classification and Ranking – CIKM2014

Query Understanding through Knowledge‑Based Conceptualization – IJCAI2015

Deep Short Text Classification with Knowledge Powered Attention – AAAI2019

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings – ACL2019

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification – IJCAI2017

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction – ACL2019

Entity Suggestion with Conceptual Explanation – IJCAI2017

Matching Article Pairs with Graphical Decomposition and Convolutions – ACL2019

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP Knowledge Graph semantic understanding cognitive concept graph

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.