Artificial Intelligence 32 min read

A Comprehensive Overview of Embedding Techniques for Recommendation Systems

This article systematically reviews mainstream embedding technologies—including matrix factorization, static and dynamic word embeddings, and graph‑based methods—explaining their principles, implementations, and practical applications in recommendation, advertising, and search systems.

DataFunTalk

Dec 1, 2020

A Comprehensive Overview of Embedding Techniques for Recommendation Systems

Embedding, the process of mapping high‑dimensional sparse data to low‑dimensional dense vectors, has become a fundamental operation in deep learning for recommendation, advertising, and search. The article begins with several definitions of embedding from mathematics, TensorFlow, and literature, and adopts the TensorFlow definition:

Embedding is a mapping that continuous‑izes discrete instances

In recommendation systems, embeddings are used to:

Replace sparse features with dense vectors in model layers (e.g., Wide&Deep, DeepFM).

Serve as pre‑trained feature vectors concatenated with other features (e.g., FNN).

Compute similarity between user and item vectors for recall (e.g., YouTube recommendation model).

Provide real‑time user/item vectors as inputs to downstream models (e.g., Airbnb).

The article then outlines three major categories of embedding techniques.

1. Classical Matrix Factorization Methods

Singular Value Decomposition (SVD) decomposes any matrix into three matrices, enabling dimensionality reduction for large rating matrices. However, traditional SVD suffers from sparsity and high computational cost, leading to variants such as FunkSVD, BiasSVD, and SVD++ that incorporate bias terms and implicit feedback.

2. Content‑Based Embedding Methods

These methods focus on textual data. Static vectors (Word2Vec, GloVe, FastText) produce fixed embeddings after training, while dynamic vectors (ELMo, GPT, BERT) generate context‑dependent embeddings using deep language models.

Examples of code snippets from the source are preserved, such as the co‑occurrence matrix example: I play cricket, I love cricket and I love football.

Static Vectors

Word2Vec uses CBOW or Skip‑gram to learn word embeddings.

GloVe learns embeddings from global word‑co‑occurrence statistics.

FastText extends Word2Vec by incorporating character n‑grams.

Dynamic Vectors

ELMo employs bidirectional LSTMs to produce word representations.

GPT adopts a Transformer‑based autoregressive language model with pre‑training and fine‑tuning stages.

BERT uses a bidirectional Transformer with Masked Language Modeling and Next Sentence Prediction tasks.

3. Graph‑Based Embedding Methods

Graph data is prevalent in industrial scenarios. Shallow graph models (DeepWalk, Node2vec, Metapath2vec) combine random walks with skip‑gram, while deep graph models (GCN, GraphSAGE) integrate graph convolutions with neural networks.

DeepWalk treats random walks as sentences and applies skip‑gram.

Node2vec introduces biased random walks controlled by parameters p (return) and q (in‑out) to balance homophily and structural equivalence.

Metapath2vec extends to heterogeneous networks using meta‑path‑guided walks.

GCN defines convolution in the spectral domain and approximates it with Chebyshev polynomials.

GraphSAGE samples and aggregates neighbor features (mean, pooling, LSTM) to generate inductive embeddings.

The article concludes that selecting an appropriate embedding method depends on data type and business requirements; static word vectors are suitable for simple text, dynamic models for context‑sensitive tasks, and graph embeddings for relational data. It also provides references for further reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

natural language processing Embedding matrix factorization Recommendation Systems Graph Neural Networks

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.