Artificial Intelligence 19 min read

Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

This article details eBay's practical experience integrating multimodal data and graph neural networks into its recommendation pipeline, covering pain‑point analysis, a twin‑tower multimodal embedding model with triplet loss and TransH, engineering design, experimental results, and key takeaways for future AI‑driven product development.

DataFunSummit
DataFunSummit
DataFunSummit
Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

Introduction eBay continuously optimizes recommendation quality by improving image‑text consistency, leveraging Graph Neural Networks (GNN) to enrich features, mitigate sparsity and cold‑start issues, and enhance user experience.

Pain Point Analysis The marketplace faces challenges such as low‑quality seller images, mismatched titles, long‑tail effects, and the limitations of single‑modality text models, which lead to poor recommendation relevance.

Multimodal Model Design A multimodal embedding pipeline combines pretrained text embeddings (BERT/LLaMA2) and visual embeddings (ResNet‑50) within a twin‑tower architecture. The twin towers share parameters and process joint image‑text vectors. Triplet loss aligns similar items while TransH projects different modalities onto a common hyperplane, and a mismatch detection module predicts image‑title inconsistency.

Engineering Design The system comprises real‑time streaming for user behavior, batch processing for item embeddings, and Faiss KNN for online serving. AB tests show significant gains in conversion and user engagement across all pages.

Why Graph Models A bipartite user‑item interaction graph is constructed, sampled using GraphSAGE or GAT, and integrated into the twin‑tower via graph aggregation. This captures collaborative signals, alleviates sparsity, and improves recall accuracy.

Model Structure The user side uses a Fusion model that aggregates user and neighbor information, while the item side employs GraphSAGE/GAT for neighbor aggregation. Outputs are fused embeddings used for recall, ranking, and re‑ranking.

Takeaways Multimodal large models (e.g., CLIP + LLaMA2) open new possibilities such as AI designers. Translating technical insights into product decisions and fostering close collaboration between algorithm engineers and product managers are critical for innovation.

Q&A Highlights The discussion covers embedding extraction, the impact of fusion on performance, differences between eBay and other platforms, and real‑time updating of user/item embeddings.

machine learningrecommendationembeddingMultimodalGNNgraph neural networkeBay
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.