Artificial Intelligence 13 min read

Content Understanding for Personalized Recommendation: Interest Graph, Concept Mining, and Semantic Matching at Tencent

The article explains how Tencent addresses the limitations of traditional content understanding methods in personalized recommendation by introducing an interest‑graph framework that combines classification, concept, entity, and event layers, and details the associated mining, matching, and online evaluation techniques.

DataFunSummit
DataFunSummit
DataFunSummit
Content Understanding for Personalized Recommendation: Interest Graph, Concept Mining, and Semantic Matching at Tencent

In modern feed recommendation, content understanding relies on two main sources: legacy technologies from the portal and search eras (classification, keyword extraction, knowledge graphs) and recent deep‑learning benefits such as embeddings. While classification is coarse and embeddings lack interpretability, Tencent proposes a solution that overcomes these issues.

Evolution of Content Understanding – The portal era (1995‑2002) used manual categorization; the search/social era (2003‑present) introduced keyword and knowledge‑graph techniques; the intelligent era (2012‑present) brought personalized recommendation driven by deep learning.

Recommendation vs. Search – Search ranks documents based on the intersection of query terms, preserving full context, whereas recommendation ranks based on the union of user interest terms, often losing the contextual relationship between interests. Therefore, recommendation requires preserving complete user context.

User Consumption Motivation – Traditional methods answer "what the article is" but ignore "why a user consumes it". Understanding the underlying intent (e.g., brand preference, safety concerns) is essential for effective recommendation.

Limitations of Traditional NLP – Classification, keyword, entity, LDA, and embedding approaches each suffer from coarse granularity, ambiguity, limited coverage, or lack of interpretability.

Interest‑Graph Framework – Consists of four layers: classification (strict tree built by product managers), concept (clusters of entities sharing attributes), entity (knowledge‑graph nodes), and event (specific happenings). Each layer addresses a different need: operational control, intent inference, recall, and precise content description.

Concept Mining – Uses search click data for weak supervision, extracts entities from clicked pages, and computes co‑occurrence frequencies to build hierarchical relationships. Semi‑supervised learning mitigates the lack of labeled data and handles granularity by leveraging user‑generated content.

Hot Event Mining – Detects bursty queries using time‑series analysis (DTW similarity to a burst template) and clusters related queries into topics, then filters non‑event topics using URL‑based features.

Relation Modeling – Computes entity associations via co‑occurrence and negative sampling, training pairwise embeddings to capture latent relationships beyond direct co‑occurrence.

Content Understanding Modules – Text classification refines PM‑defined categories with user click clustering; keyword extraction combines traditional feature engineering with GBRank and a re‑ranking layer using relational embeddings; semantic matching employs recall‑plus‑ranking, using both relational and semantic vectors to retrieve and rank candidate concepts and events.

Online Results – Adding concept and event layers to the baseline (classification + entity) yields significant improvements in online metrics, demonstrating the effectiveness of the interest‑graph approach.

The presentation concludes with a thank‑you and invites the audience to join the DataFunTalk community for further discussion.

personalizationrecommendationEmbeddingNLPcontent understandinginterest graph
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.