How Virtual Category Trees Boost E‑Commerce Search and Recommendation

This article explains how Alibaba builds a virtual category (CPV) system for Taobao by merging similar categories, splitting overly coarse ones, and constructing a hierarchical virtual category tree using methods such as PMI, title term similarity, frequent itemset mining, and graph‑embedding techniques, ultimately reducing user fatigue and improving CTR.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Virtual Category Trees Boost E‑Commerce Search and Recommendation

Introduction

Each product belongs to a single leaf category in Taobao's tree‑structured category system, which contains over 20,000 categories. The virtual category (CPV) system is built on top of this hierarchy to improve product publishing, management, and search relevance.

Problem Background

Two main issues arise in recommendation scenarios: (1) different top‑level categories may contain identically named sub‑categories, causing duplicate recommendations and user fatigue; (2) some categories are too coarse, grouping many diverse products together, which harms recommendation precision.

Overall Solution

The solution consists of three parts: merging similar categories into a new virtual node, splitting coarse categories into finer virtual nodes, and iteratively constructing a virtual category tree.

Category Merging Scheme

The goal is to identify similar categories and merge them into a virtual node. Similarity cases include identical names, near‑identical names, and same product type but different target audiences.

Similarity Discovery Methods

Query‑based category prediction using pointwise mutual information (PMI).

Title term similarity by extracting and comparing category words from product titles and queries.

User‑behavior analysis via frequent itemset mining (FP‑Growth) and PMI.

Graph embedding (DeepWalk + Skip‑Gram) to learn category embeddings and compute cosine similarity.

PMI Method

PMI calculates the association between two categories based on co‑occurrence probability in query predictions: PMI(x,y)=log\frac{P(x,y)}{P(x)P(y)}. Strongly associated category pairs are merged.

Title Term Method

Product titles and associated queries are tagged to extract category words, forming term lists for each category. Pairwise similarity of these term lists identifies similar categories.

User‑Behavior Method

Frequent itemset mining (FP‑Growth) extracts category pairs that often appear together in user sessions. The resulting pairs are combined with PMI scores for robustness.

Graph Embedding Method

Random walks on the category graph generate sequences, which are fed into a Skip‑Gram model (DeepWalk) to obtain category embeddings. Cosine similarity between embeddings reveals related categories.

Category Splitting Scheme

Coarse categories are split into finer virtual nodes using two approaches:

Hierarchical (hypernym‑hyponym) relationships derived from a lexical taxonomy.

Product clustering based on user behavior graphs, employing label propagation and DeepWalk to obtain item embeddings and cluster similarity.

Results

Deploying virtual categories in homepage, promotion, and post‑purchase flows reduced user fatigue by over X% and improved click‑through rates. The improvements were validated through manual evaluation and online A/B tests.

Conclusion and Future Work

The virtual category system is an ongoing effort that will continue to evolve. Future work includes refining the virtual category tree, extending its use to search scenarios, and further optimizing merging and splitting algorithms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

graph embeddingtaxonomyvirtual categoriescategory mergingcategory splitting
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.