Artificial Intelligence 12 min read

Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models

This article reviews recent research on e‑commerce search, covering transformer‑based complementary rankings, Alibaba's cognitive concept net and its extension, joint deep retrieval with product quantization, personalized semantic retrieval, multi‑granularity deep semantic retrieval, and graph‑attention networks for long‑tail shop search.

DataFunSummit
DataFunSummit
DataFunSummit
Advances in E‑commerce Search: Embedding, Knowledge Graphs, and Retrieval Models

In the era of big data, e‑commerce search faces challenges such as bridging the semantic gap between queries and product titles, handling sparse shop features, and providing personalized results. Most retrieval and ranking pipelines now rely on embedding techniques, and this article discusses several recent academic contributions addressing these issues.

CoRT: Complementary Rankings from Transformers

After the rise of BERT, the paper combines BERT‑based similarity between query and passage with traditional BM25 rankings, then merges the two to improve overall ranking accuracy, demonstrating that classic methods can still boost BERT‑driven retrieval.

AliCoCo: Alibaba E‑commerce Cognitive Concept Net

AliCoCo builds a four‑layer product graph (item, e‑commerce concepts, primitive concepts, taxonomy) to handle imprecise search scenarios such as "outdoor barbecue". Primitive concepts are extracted using rule‑based + human methods and LSTM‑CRF, while relationships are constructed via rules or supervised learning with word‑level representations, enriched by external knowledge to reduce semantic drift.

AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E‑commerce

AliCoCo2 extends the primitive‑concept layer with richer entities and relations, learns graph representations with GAT, and concatenates them with BERT embeddings. Binary relations use TransE, tree‑structured relations use Poincaré embeddings, and N‑ary relations compute distances between entity and concept embeddings (e.g., e_c ) before feeding them to a multilayer perceptron.

Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

The paper proposes a joint training framework that parameterizes clustering centroids to reduce sub‑space partitioning errors. Item embeddings are first orthogonal‑projected ( x' ), matched with centroid matrix v to find the nearest row v_r , then split into D subvectors whose centroids are learned. A residual quantized embedding is reconstructed by multiplying with a rotation matrix R .

Towards Personalized and Semantic Retrieval (DPSR)

DPSR is a dual‑tower model where the user tower encodes demographics, historical clicks, and query intents (multiple heads for ambiguous queries like "apple"). The item tower processes title, brand, and category. Multi‑head attention aggregates intent vectors, and a weighted loss combines user and query signals to improve personalization and disambiguation.

Embedding‑based Product Retrieval in Taobao Search

The authors introduce MGDSPR, a multi‑granularity deep semantic retrieval model that represents queries at six granularities (character c , word w , transformer Trm , etc.). After extracting these features, attention fuses them with long‑term, short‑term, and real‑time user behavior to produce the final user‑tower vector, which is then matched with item vectors via inner product.

A Dual Heterogeneous Graph Attention Network for Long‑Tail Shop Search

To address sparse shop‑search data, the paper combines a dual‑tower architecture with a heterogeneous graph attention network (DHGAT) and a token‑knowledge‑pairing system (TKPS). User features are one‑hot encoded, while DHGAT aggregates node features in two attention layers. TKPS uses product titles as bridges between queries and shops, and the loss incorporates both cross‑entropy and neighbor‑aware terms.

Conclusion

Product search differs from document search in that queries and item descriptions are short, making intent detection difficult. Effective solutions therefore integrate rich item metadata, user behavior, and comprehensive knowledge graphs to handle scenario‑based, personalized, and shop‑level searches.

References

[1] CoRT: Complementary Rankings from Transformers

[2] AliCoCo: Alibaba E‑commerce Cognitive Concept Net

[3] AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E‑commerce

[4] Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

[5] Towards Personalized and Semantic Retrieval: An End‑to‑End Solution for E‑commerce Search via Embedding Learning

[6] Embedding‑based Product Retrieval in Taobao Search

[7] A Dual Heterogeneous Graph Attention Network to Improve Long‑Tail Performance for Shop Search in E‑Commerce

e-commerceAIembeddingknowledge graphgraph neural networkretrievalSearch
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.