Artificial Intelligence 20 min read

Building and Applying a Multi‑Language Product Knowledge Graph at Shopee

This presentation details Shopee's approach to constructing a multilingual product knowledge graph, covering ontology modeling, data acquisition, fusion techniques, and practical applications, while discussing challenges, model architectures, and future directions for large‑scale e‑commerce AI systems.

DataFunTalk
DataFunTalk
DataFunTalk
Building and Applying a Multi‑Language Product Knowledge Graph at Shopee

Shopee, a global e‑commerce platform, faces the challenge of handling multilingual and mixed‑language product data across many markets, prompting the development of a comprehensive product knowledge graph and associated algorithms.

Table of Contents

1. Knowledge Modeling 2. Knowledge Acquisition 3. Knowledge Fusion 4. Knowledge Application 5. Knowledge Graph Outlook

01 Knowledge Modeling

1. Knowledge Ontology

The ontology layer defines product categories and attributes, forming the backbone of the graph; each category combines attribute types and values to represent detailed product entities.

2. Ontology – Entity

Entities (items, SKUs) are linked to the ontology, enabling large‑scale structured product representation.

3. Ontology – Uplift All in One

To keep the ontology up‑to‑date, Shopee adopts a New Phrase Mining pipeline based on SpanNER with an information‑bottleneck layer, improving OOV handling and achieving notable accuracy gains.

02 Knowledge Acquisition

1. Challenges

Handling thousands of fine‑grained categories, multi‑language corpora, and massive attribute combinations (260K+) while maintaining >90% service precision.

2. Item Category Classification

Solutions include hierarchical classifiers, end‑to‑end models, and an Align‑before‑Fuse multimodal framework (image‑text contrastive learning, masked language modeling, momentum distillation) achieving 85‑90%+ accuracy across markets.

3. Item Attribute Recognition

Four recognition streams—string‑match, rule‑based, NER, and image models—extract attribute types and values, followed by confidence‑based integration and normalization.

4. Entity Fusion & Error Correction

CrossEncoder with multi‑task learning and Sentence‑BERT Siamese networks detect mis‑classified items; active learning reduces labeling effort while boosting recall.

03 Knowledge Fusion

1. Ontology Fusion

Maps Shopee’s category, attribute, and value hierarchies to external taxonomies, using synonym and lexical matching.

2. Entity Fusion

Identifies relationships such as same‑model, similar, or related products using graph‑text similarity and SPU (Standard Product Unit) nodes for fine‑grained aggregation.

3. Information Fusion

Combines attribute extraction, entity linking, and SPU construction to produce a unified product knowledge graph.

04 Knowledge Application

The graph powers various services: market insight for operations, intelligent category recommendation, price and logistics completion for merchants, and personalized recommendation and search enhancement for consumers.

05 Knowledge Graph Outlook

Future work envisions tighter integration with large language models (LLMs) for data augmentation, prompt‑based reasoning, and hybrid pipelines, while maintaining domain‑specific models for high‑precision vertical tasks.

Thank you for your attention.

e-commerceMachine Learningknowledge graphmultilingual NLPproduct taxonomy
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.