Artificial Intelligence 12 min read

Content Understanding for Personalized Feed Recommendation: From Classification to Interest Graphs

The article explains how Tencent tackles content understanding in feed recommendation by evolving from traditional classification, keyword, and entity methods to a multi‑layer interest graph that captures concepts and events, addressing the need for full context, reasoning about user intent, and improving online performance.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Content Understanding for Personalized Feed Recommendation: From Classification to Interest Graphs

This talk, presented by Tencent senior researcher Guo Weidong, outlines the challenges of content understanding in modern feed recommendation systems and introduces a multi‑layer interest graph solution.

Evolution of Content Understanding – The author reviews three eras: portal (1995‑2002) with manual classification, search/social (2003‑present) adding keywords and knowledge graphs, and the intelligent era (2012‑present) driven by deep learning embeddings, highlighting the limitations of each approach for recommendation.

Recommendation vs. Search – Search ranks documents based on the intersection of query terms, preserving full context, while recommendation ranks based on the union of user interest terms, often losing the contextual relationship between terms such as "Wang Baoqiang" and "Ma Rong".

Why Users Consume Content – Traditional methods answer "what is the article?" but ignore "why the user consumes it," necessitating inference of real consumption intent (e.g., brand preference, safety concerns).

Shortcomings of Traditional NLP – Classification, keyword extraction, entity recognition, LDA, and embedding each have drawbacks such as coarse granularity, ambiguity, lack of interpretability, or inability to capture user intent.

Interest Graph Architecture – The proposed graph consists of four layers: classification (PM‑defined taxonomy), concept (abstract ideas like "elderly‑friendly phone"), entity (knowledge‑graph entities), and event (specific happenings). This structure balances operational needs with reasoning about user intent.

Concept Mining – Concepts are short phrases discovered via weak‑supervised learning on search click data, addressing cold‑start and granularity issues by leveraging user‑generated content.

Hot Event Mining – Popular queries are detected using burst‑region detection (BRD) improved with DTW similarity, followed by topic clustering and event naming based on URL features.

Relation Modeling – Entity co‑occurrence and sequential search patterns are used to compute association scores, refined with embedding‑based pairwise training to handle sparse or unseen pairs.

Content Understanding Modules – The system includes text classification (enhanced by user click clustering), keyword extraction (traditional features + GBRank + re‑ranking with relation embeddings), and semantic matching (recall‑plus‑ranking using both relational and semantic vectors, with coarse‑set pruning for efficiency).

Online Results – Experiments show that adding concept and event layers to the traditional entity and classification tags yields a significant lift in online metrics compared to the baseline.

The presentation concludes with a thank‑you and contact information.

personalizationrecommendationAIEmbeddingNLPcontent understandinginterest graph
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.