Artificial Intelligence 17 min read

Hotspot Mining and Event Extraction in Tencent Information Flow: Methods, Framework, and Applications

This article presents Tencent's research on hotspot mining and event extraction for information flow, detailing the challenges of timeliness, comprehensiveness, and heat rationality, the combined use of time‑series analysis, topic detection, clustering, and dynamic‑time‑warping, and the resulting framework and its applications to text, image, and video recommendation.

DataFunTalk
DataFunTalk
DataFunTalk
Hotspot Mining and Event Extraction in Tencent Information Flow: Methods, Framework, and Applications

Introduction – Modern news and social media platforms prominently display hot topics, and the ability to quickly discover, organize, and personalize these hotspots is crucial for user experience.

Project Background – The authors identify four key problems: (1) timeliness – rapid detection of events such as Kobe Bryant’s death; (2) comprehensiveness – aggregating diverse reports across sites; (3) heat rationality – differing popularity across data sources; and (4) distribution – handling cold‑start issues for newly emerging events.

Related Research Methods

Event Extraction – Detecting trigger words and classifying event types (e.g., "die" for Kobe’s death) using ACE‑style frameworks.

Topic Detection & Tracking (TDT) – Clustering queries and documents to form topics, employing k‑means, DBSCAN, hierarchical clustering, and enriched semantic, entity, and event features.

Dynamic Time Warping (DTW) – Aligning time‑series of query frequencies to templates for hotspot identification.

Hotspot Computation Framework – Consists of offline mining and online understanding. Offline mining ingests data from three Tencent platforms, performs topic extraction, topic fusion, and event splitting; online understanding matches new content to the event library.

Hotspot Mining Pipeline

Query‑Log Mining – Build time‑series from search queries, detect popular queries using DTW, and cluster similar queries into topics.

Article Mining – Convert article content into keyword features, apply DTW‑based hotspot detection, then perform hierarchical clustering to form topics while filtering non‑hot content.

Topic Fusion – Merge topics from different sources, compute a unified heat score combining consumption and production signals.

Event Splitting – When a topic reaches sufficient granularity, split it into distinct events using trigger‑argument analysis and seq2seq‑based naming.

Heat Calculation – Combine consumption heat (user searches), production heat (author activity), and global heat to assign final scores for recommendation.

Applications

Image‑Text Hotspot – Use a dual‑tower and MatchPyramid model to match article titles with event names, assigning event and topic tags for recommendation.

Video & Short‑Video Hotspot – Transfer textual hotspot signals to video content, filter by popular keywords, and match video titles to event names for personalized feeds.

Conclusion – The presented framework improves hotspot relevance, timeliness, and personalization across multiple media formats, supporting Tencent’s information‑flow recommendation system.

information retrievalNLPtime series analysistopic detectionEvent Extractionhotspot mining
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.