How Alibaba’s AliCoCo Knowledge Graph Revolutionizes E‑Commerce Search & Recommendation

Alibaba’s AliCoCo, a large‑scale e‑commerce cognitive concept net, models user needs as graph nodes, linking concepts, primitives, taxonomy and items, and leverages advanced NLP, BiLSTM‑CRF, projection learning and knowledge‑enhanced models to boost search relevance, recommendation diversity, and overall user experience.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s AliCoCo Knowledge Graph Revolutionizes E‑Commerce Search & Recommendation

Background

Traditional e‑commerce search and recommendation rely heavily on keyword matching and historical behavior, which cannot fully capture diverse user intents such as scenario‑based needs (e.g., "outdoor barbecue"). The underlying category‑property‑value data model lacks the breadth and depth to represent these intents, creating a semantic gap between user demand and algorithmic understanding.

AliCoCo

AliCoCo (Alibaba E‑commerce Cognitive Concept Net) is a concept graph that explicitly represents user demands as short phrases (e‑commerce concepts) and connects them to items through a hierarchy of four layers:

E‑commerce Concepts

Primitive Concepts

Taxonomy

Items

This structure serves as the foundation for Alibaba’s core e‑commerce engine.

Taxonomy

The taxonomy is a massive tree covering millions of primitive concept instances. About 20 top‑level classes (e.g., Time, Location, Action, Function, Category, IP) are manually defined, each further refined into sub‑classes and leaf nodes. Instances such as "barbecue", "outdoor", and "sunny" are placed under appropriate classes, forming a rich ontology comparable to Freebase or DBpedia but enriched with both entities and concepts.

Primitive Concepts

Primitive concepts are fine‑grained words that describe e‑commerce concepts. Two main tasks are addressed:

Vocabulary mining – using ontology alignment and a BiLSTM‑CRF model to extract new terms from large‑scale corpora.

Hypernym discovery – combining pattern‑based unsupervised methods with projection‑learning supervised methods to build hierarchical relations.

E‑commerce Concepts

An e‑commerce concept is a short phrase that satisfies five criteria: consumer demand, fluency, reasonableness, clear target audience, and no typographical errors. Concepts are generated in two stages: candidate generation (AutoPhrase mining from query logs, titles, reviews, and pattern‑based composition) and concept discrimination using a knowledge‑enhanced Wide&Deep model that combines BiLSTM features, POS/NER tags, Wikipedia gloss embeddings, and BERT perplexity scores.

Candidate generation – AutoPhrase on massive corpora and pattern‑based phrase construction (e.g., "[event] used for [function][category]").

Concept discrimination – Wide&Deep architecture with knowledge‑enhanced embeddings.

Linking to primitive concepts – short‑text NER with a fuzzy‑CRF layer to handle multiple possible labels.

Item Association

After building the concept layers, items (billions of products) are linked to e‑commerce concepts via semantic matching. The model incorporates primitive‑concept features and Wikipedia glosses to mitigate semantic drift and improve relevance, especially for short queries.

Applications

Search : By enriching queries with concept‑item links and hierarchical relations, relevance improves dramatically. Knowledge cards surface concept‑based product collections (e.g., "baking tools" when searching "baking"), and the system supports voice‑based QA such as "What do I need for an outdoor barbecue?".

Recommendation : Concept cards are inserted into the mobile homepage flow, providing theme‑based recommendations and explainable reasons (e.g., showing "outdoor barbecue" as a recommendation rationale). This has been running stably for over a year, increasing user satisfaction.

Summary and Future Work

AliCoCo 1.0 contains 2.8 M primitive concepts and 5.3 M e‑commerce concepts, covering over 98 % of Taobao/Tmall items. Average item linkage: 14 primitive concepts and 135 e‑commerce concepts. Query coverage rose from 35 % to 75 %. Future directions include expanding commonsense relations, probabilistic modeling of concept‑item links, and multilingual/industry extensions.

References

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM‑CRF models for sequence tagging.

Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora.

Josuke Yamane et al. 2016. Distributional hypernym generation by jointly learning clusters and projections.

Dmitry Ustalov et al. 2017. Negative sampling improves hypernymy extraction based on projection learning.

Khodak et al. 2018. A la carte embedding: Cheap but effective induction of semantic feature vectors.

Jingbo Shang et al. 2018. Automated phrase mining from massive text corpora.

Heng‑Tze Cheng et al. 2016. Wide & deep learning for recommender systems.

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents.

Jacob Devlin et al. 2018. BERT: Pre‑training of deep bidirectional transformers for language understanding.

Jingbo Shang et al. 2018. Learning named entity tagger using domain‑specific dictionary.

Po‑Sen Huang et al. 2013. Learning deep structured semantic models for web search using clickthrough data.

Liang Pang et al. 2016. Text matching as image recognition.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commercerecommendationnatural language processingKnowledge Graphsearch relevance
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.