How Alibaba Builds a Massive E‑Commerce Concept Graph to Power Search & Recommendation
This article explains how Alibaba’s Search & Recommendation team constructs a large‑scale e‑commerce concept graph—defining e‑commerce concepts, mining them from queries and titles, building an ontology, linking concepts to entities, and applying the graph to improve personalized search and recommendation.
Background
Although e‑commerce search and recommendation algorithms have advanced, they still suffer from problems such as duplicate recommendations and lack of novelty because they follow a "item‑to‑item" paradigm rather than being driven directly by user demand.
What Is an E‑Commerce Concept?
An e‑commerce concept is a short, semantically complete phrase that represents a user demand, e.g., "dress", "children's anti‑lost", "BBQ essentials". Concepts must follow basic principles and are divided into three categories: shopping scenario, extensive category, and general concept.
Concept Mining
Concepts are mined primarily from user search queries and product titles. The mining pipeline consists of candidate generation (using AutoPhrase and a sequential pattern extractor combined with a 2‑gram language model) and concept classification (a discriminative model that fuses language model embeddings, sequence information, and rule‑based features).
Candidate Generation
Patterns are extracted from labeled positive and negative concepts, weighted, and combined with a statistical language model to prune candidates, ensuring syntactic correctness.
Concept Classification
A Wide&Deep model trained on sequence features, along with ELMo embeddings, is used to judge concept validity, handling short, noisy query and title texts.
Ontology
To give concepts richer semantics, an e‑commerce ontology is built, defining entities, concepts, attributes, and relationships. The ontology follows a schema.org‑like hierarchy rooted at Thing , with nine top‑level subclasses such as Action , CreativeWork , Product , and Person . Each subclass inherits attributes and relations from its parent.
Attributes such as alias , description , image , and name are defined for Thing . The Category subclass adds a specific category type attribute. Over 140 attributes and relations are modeled.
From Knowledge Graph to Cognitive Graph
Concepts are linked to ontology nodes through a tagging process. Because many concepts are short, word‑sense disambiguation is performed using a sequence labeling model with attention to domain context, followed by fine‑grained tagging to resolve one‑to‑many type mappings.
Edge Types
The graph defines 19 relation types, including isA (e.g., "Bohemian dress isA dress") and is_related_to (linking concepts to items). Building isA relations involves pattern‑based extraction and vector‑based prediction, followed by cleaning steps such as deduplication and cycle removal.
Applications
Explicit
The concept graph powers theme cards on the homepage (e.g., "Shopping Encyclopedia") and scenario‑based recommendations on product detail pages.
Implicit
By providing richer node‑edge data centered on concepts, the graph enhances search and recommendation algorithms with deeper semantic understanding, enabling explainable recommendation, knowledge graph embedding, and reasoning‑based recommendation.
Conclusion and Outlook
Building a cognitive graph requires massive resources across algorithms, engineering, operations, and crowdsourcing. This article offers a high‑level view from an algorithmic perspective; many components remain under active research and optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
