How Vector Embeddings Power E‑Commerce Search and Recommendation at NetEase Yanxuan
This article explains how Yanxuan built a comprehensive vector system—from product embeddings and graph models to large‑scale similarity computation—and applied it across search, recommendation, and purchase prediction tasks, highlighting practical algorithms, infrastructure, and future directions.
Vector System Overview
Vectorization is increasingly used in industry; Yanxuan began exploring it in late 2018 across search and recommendation tasks, expanding from product recall to search ranking, discovery, suggestions, cross‑category recommendation, multi‑interest recall, and more.
Algorithm Models
Initially graph‑embedding techniques such as LINE and Node2Vec were used, followed by YoutubeDNN‑style models. A two‑step strategy was adopted: first learn high‑quality product vectors, then aggregate other objects (users, queries) via a dedicated module. This approach outperformed earlier attempts.
Product Vector Learning
Product vectors are learned from user behavior (continuous clicks, purchases) and product attributes. By fusing click, purchase, and attribute data, a robust product embedding is obtained.
Training data is built as a directed weighted graph of product relationships, with loss functions that jointly optimize similarity between a center product and its context clicks and global purchase sequence, plus negative sampling within each batch.
Two solutions address daily retraining issues: an affine transformation aligning vectors across days, and incremental training with previous‑day vectors as initialization and lower learning rate fine‑tuning. The latter performed better.
Since 2019, GNN models such as GraphSAGE, LightGCN, and SR‑GNN have been evaluated, with SR‑GNN showing promising results.
Extended Vector Learning
Beyond product vectors (I), adding user vectors (U) enables U→I and I→U recall. Introducing query vectors (Q) further expands scenarios to Q→I, I→Q, Q→Q, etc., and later category (C) and topic (T) vectors broaden coverage.
A unified aggregation framework processes target‑source relationship tables, applying time decay, weight accumulation, noise filtering, attention, and clustering modules to produce vectors for any object in the same space.
Related Technologies
Large‑scale similarity computation is handled offline with data partitioning and matrix operations, achieving billions of exact calculations in minutes. Approximate nearest‑neighbor search uses LSH and FAISS for online recall.
An online vector storage and asynchronous aggregation service supports real‑time personalized re‑ranking, topic sorting, and activity page personalization.
Practical Applications
Search Scenario : Vector similarity powers discovery words (Q→Q), query‑to‑product (Q→I), and user‑to‑query (U→Q) recommendations, combining offline and real‑time vectors for robust ranking.
Recommendation Recall : Multiple user vectors (long‑term, short‑term, real‑time, multi‑interest, group) are used according to display slots, improving diversity and relevance.
Purchase Prediction : Similarity among viewed products (I→I) distinguishes purposeful browsing from random exploration; combining current‑day similarity with historical user interest (U→I) boosts conversion.
Summary and Outlook
The vector system accelerates new feature rollout and often outperforms complex traditional methods, embodying Occam’s razor. Future work will address limitations of vector representations and explore new encoding techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
