Embedding Techniques and Practices in Tencent Mobile News Recommendation System

This article reviews the concept, history, and practical implementations of embedding—including item, image, and user embeddings—and describes various vector‑based recall strategies such as i2i, u2i, clustering, and deep‑learning models used in Tencent's mobile news recommendation platform.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Embedding Techniques and Practices in Tencent Mobile News Recommendation System

In current recommendation systems, embeddings are everywhere; mastering embeddings can solve a key difficulty of the whole pipeline. This article summarizes the embedding technology practice in Tencent Mobile News recommendation, aiming to both entertain readers and engineers.

What is embedding? Embedding is a dense vector representation that replaces one‑hot encoding. Compared with one‑hot, embedding smooths the representation, while one‑hot can be seen as a max‑pooled version of embedding.

Visually, an RGB color can be represented by a three‑dimensional vector with clear physical meaning, whereas a typical embedding is the weight matrix of the penultimate neural‑network layer, learned from random initialization and refined by optimization.

Embedding development milestones: Hinton introduced the concept in 1986; word2vec achieved the first industrial success; subsequent models such as item2vec, wide & deep, and YouTube extended embeddings to features, while FAISS solved the final‑mile vector retrieval.

Embedding benefits:

Transforms natural language into computable numbers.

Replaces one‑hot, dramatically reducing feature dimensionality.

Supplants collaborative matrices, greatly lowering computational complexity.

Item embedding – Items in Tencent Mobile News are image‑text pairs, so item vectors are obtained by concatenating text embeddings (derived from word2vec‑style models) and image embeddings (e.g., ResNet features, image captions, FaceNet for celebrity detection, OCR for comics, style transfer for age/gender cues).

Image embedding – Different CNN layers capture hierarchical visual features; lower layers learn generic edges, while higher layers become task‑specific. Pre‑trained low‑level weights are reused for new tasks, whereas high‑level features may be fine‑tuned.

User embedding – Users are embedded into the same vector space as items. Early versions used important profile features (tags, media IDs, categories, topics). Later, a DSSM model aligned user and item vectors, and the current approach employs BERT + LSTM on user behavior sequences.

Embedding‑based recall – After obtaining item and user vectors, various vector‑based recall methods are applied. Most practices use a single embedding; a few use multiple embeddings.

Basic i2i (item‑to‑item) recall uses fastText + FAISS to generate vectors such as item2vec, media2vec, tag2vec, loc2vec, title2vec. Tag2vec, for example, averages the vectors of the top three tags of an article. After retrieving candidate items (e.g., 1000 per article) with FAISS, a cosine similarity threshold (e.g., 0.6) filters candidates, then additional signals (popularity, CTR, freshness) re‑rank the results.

Other recall types follow the same pattern but train different embeddings: tag2vec (word vectors), item2vec (article IDs), media2vec (author IDs), loc2vec (location names), title2vec (LSTM‑derived title vectors), doc2vec (BERT‑derived article body), entity2vec (knowledge‑graph TransE).

u2i (user‑to‑item) recall – Implementations include user2vec, word2vec personalization, cross‑tag, and DSSM personalization. User2vec computes similarity between user tag vectors and article tag vectors; DSSM aligns 64‑dimensional user and item vectors; cross‑tag aggregates multiple tag‑based user vectors.

Advanced u2i strategies involve clustering users. Incremental clustering assigns new users to the nearest K‑means centroid; cluster‑level real‑time collaborative filtering or offline similarity scoring then produces recommendations. Group‑profile recall merges users in a cluster into a single profile for personalized recall.

LSTM‑based clustering uses recent clicked article embeddings as input to an LSTM to obtain user vectors, but its computational cost limits scalability. DSSM clustering projects user and item profiles into a shared 64‑dimensional space and follows the same cluster‑recall pipeline, showing significant gains and now in production.

bnb clustering concatenates multiple item embeddings (tag, topic, category) into a single vector, similar to Airbnb’s approach for sparse samples, yielding modest improvements.

Incremental clustering keeps cluster centroids stable for long periods while allowing users to migrate between clusters, preserving the semantic meaning of cluster labels for downstream ranking models.

Steps:

Pre‑cluster data with K‑means.

Store cluster centers C and labels L.

For a new point Xnew, compute distances to all C.

Assign Xnew to the nearest centroid, obtaining label Li.

Periodically recompute all centroids during low‑traffic windows to correct drift.

Dynamic rule clustering merges small clusters into the most similar larger ones based on user interest tags, iterating until cluster sizes are balanced; this method improves CTR by about 3%.

Other embedding‑based recall algorithms employ DNNs such as CNN, attention, and YouTube‑style models. CNN fuses title, tags, and abstract; attention combines textual and visual information. The YouTube model treats every feature as an embedding, concatenates them, and feeds the long vector into a DNN. Initial experiments on news showed limited success due to the short lifespan of news items compared with videos.

Airbnb contributed innovations for sparse samples, notably cluster‑level embeddings and joint user‑item training, which inspired our dynamic rule clustering.

In feature engineering, discrete, continuous, and multi‑value features can be embedded via pre‑trained vectors (large sample, well‑trained) or end‑to‑end embedding layers (unified gradient, but many parameters and slower convergence on small data).

Optimizations to embedding computation are often tied to network‑architecture improvements; both aim to reduce the overall error of the model.

Embedding has drawbacks: difficulty preserving semantics during incremental updates, challenges handling multiple features simultaneously, and poor training on long‑tail data.

Recent research from Alibaba and Google proposes residual embeddings (center + residual vectors) and frequency‑aware encoding spaces to improve clustering density and allocate more code space to high‑frequency features.

Overall, embedding remains a powerful technique whose evolution in Tencent’s mobile news recommendation has progressed through stages from basic word2vec to sophisticated multi‑modal, clustered, and deep‑learning‑driven pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

clusteringrecommendation systemFAISSDSSMEmbeddingvector similarityitem2itemuser2item
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.