Artificial Intelligence 21 min read

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

This article presents Shopee's comprehensive exploration of building an e‑commerce knowledge graph, detailing its challenges, construction pipeline, AI‑driven extraction and fusion techniques, multilingual and multimodal modeling, and practical applications ranging from search and recommendation to AI assistants and real‑time updates.

DataFunTalk
DataFunTalk
DataFunTalk
Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

Shopee shares its experience in constructing an e‑commerce knowledge graph and combining it with large AI models, outlining the motivations, technical challenges, and solutions.

The knowledge graph aims to bridge buyer intent and seller information, unify product data across markets, and improve user experience, while addressing issues such as heterogeneous data sources, inconsistent language, and massive scale.

Key challenges include multi‑source information diversity, varying data quality, language differences, and the sheer volume of billions of items across eight markets and six languages.

Compared with pure deep‑learning approaches, knowledge graphs offer stronger interpretability by structuring entities, relations, and attributes, though they require careful construction and domain expertise.

The construction pipeline consists of defining an ontology, extracting information from product pages, images, and reviews, assessing text and image quality, performing multimodal extraction, and handling entity disambiguation and attribute alignment.

Extraction methods combine rule‑based filters, multi‑task BERT models, NER, and prompt‑tuned T5, while multimodal alignment leverages models such as ALBEF, BLIP, Labse‑DinoV2‑ViT, and multilingual encoders; model compression techniques like MiniLM are also explored.

Knowledge fusion tackles entity linking, spelling correction, synonym handling, and unit conversion, using similarity measures, translation, and learned standardization models.

Knowledge processing includes reasoning and inconsistency detection through rule‑based association mining, graph embedding inference, and analogical reasoning to complete and validate the graph.

Practical applications span search (enhancing query understanding), recommendation (fine‑grained category recall), operations (seller data quality checks, product selection), AI assistants, and multimodal product generation (titles, descriptions, images, videos).

The integration with large models follows three patterns: feeding the graph into models, using models to enrich the graph, and co‑training; challenges include graph completion, hallucination mitigation, and real‑time updates via RAG and selective layer execution.

The article concludes with a Q&A session covering quality scoring, entity disambiguation, low‑resource training, tool recommendations, and the relationship between knowledge graphs and product catalogs.

e-commerceLarge Language ModelsAI applicationsmultimodalknowledge graphInformation Extractionentity resolution
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.