Hot Topic Mining and Expansion Using User‑Behavior Graph Embedding for Recommendation Systems
This article surveys recent research on extracting and expanding hot topics from short texts by constructing user‑behavior graphs, applying graph‑embedding techniques, and leveraging multi‑task learning to improve recommendation relevance, timeliness, and cold‑start handling in large‑scale platforms.
The article begins by revisiting previous work on short‑text concept extraction and query expansion, highlighting the need for timely hot‑topic discovery in domains such as medical forums and social media, where outdated recommendations can mislead users.
It then discusses the motivation for combining short‑term user behavior with concept clustering to identify emerging topics, using examples from Weibo and e‑commerce search to illustrate the problem of stale recommendations.
Several representative papers are reviewed, including Tencent's 2019 KDD study on text conceptualization, Alibaba's 2018 KDD work on billion‑scale commodity embedding, and the 2020 UIUC‑Amazon "Octet" paper on self‑supervised taxonomy enrichment. The surveys cover graph‑embedding methods (DeepWalk, RGCN), side‑information integration, and multi‑task learning frameworks such as M2GRL.
The proposed baseline constructs a bipartite graph from user clicks and queries, applies weighted PageRank with a Newton‑cooling decay to rank topic popularity, and uses LDA‑style clustering on TF‑IDF, doc2vec, and word2vec features to group concepts. Data cleaning steps (removing clicks <2 s, filtering hyper‑active users, and pruning frequent item edits) are described.
Experimental results show that the graph‑based approach improves hot‑topic detection and recommendation quality, with visualizations of ranking improvements and the final expanded topic list. The article concludes that graph representation learning on user behavior is a powerful, industry‑adopted technique for timely, personalized content recommendation.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.