Artificial Intelligence 16 min read

Pretraining Models and Graph Neural Networks for Recommendation Systems

This talk explores the evolution, objectives, and core challenges of pretraining models, their application in recommendation scenarios, service modes, and detailed case studies of graph neural network pretraining, illustrating how self‑supervised learning and multi‑domain data integration enhance user and item embeddings for improved recommendation performance.

DataFunTalk

Sep 19, 2022

Pretraining Models and Graph Neural Networks for Recommendation Systems

Guest Speaker : Dr. Song Chonggang, Senior Researcher at Tencent, presented a comprehensive overview of pretraining models and graph neural networks (GNN) in recommendation systems.

1. Development History of Pretraining Models

With the rapid advancement of deep learning, reliance on labeled data limits the ability to leverage massive unlabeled data. Starting from the NNLM word embedding concept in 2003, pretraining frameworks such as BERT (2019) and MAE (2021) have achieved significant success across NLP and CV. In GNN, early self‑supervised methods like Node2Vec and Metapath2Vec evolved into auto‑encoder and contrastive learning approaches, expanding the toolbox for graph pretraining.

2. Objectives of Pretraining

Break data silos to leverage global information for downstream tasks.

Integrate diverse task information into a unified representation space, reducing over‑fitting risk.

Provide rich signals for new users, new scenarios, and long‑tail items, alleviating sparse label problems.

The generic pretraining pipeline first gathers massive global data, learns self‑supervised user and item embeddings, incorporates cross‑domain auxiliary samples for pretraining, and finally fine‑tunes on target‑domain data.

3. Core Issues in Pretraining

How to exploit global data via self‑supervised learning?

How to select network structures that best fit the target task (e.g., BERT for NLP, GNN variants for social/behavior graphs)?

How to design transferable architectures that generalize across domains (e.g., meta‑learning, contrastive learning)?

4. Pretraining Models in Recommendation Scenarios

Pretraining models can be classified by data organization (behavior sequences, behavior graphs) and by cross‑domain transfer methods (meta‑learning, multi‑task learning). GNNs are especially suitable because recommendation data naturally forms graph structures such as social networks and knowledge graphs.

Advantages of GNNs in recommendation:

Data fit: Graph structures align with social and knowledge graph data.

Rich information: Higher‑order relationships via PageRank‑based sampling, k‑Clique, centrality measures.

Universality: GNNs can be combined with traditional deep nets and node feature embeddings.

5. Service Modes of Pretraining Models

Feature mode : Use pretrained embeddings to enrich downstream model features.

Recall mode : Directly compute similarity between user and item embeddings for candidate retrieval, supporting various business goals and cross‑domain activation.

Sub‑model mode : Embed the pretrained network as a sub‑module within the downstream model, preserving pretrained parameters and abstraction capability.

Examples of feature integration:

Separate modeling of discrete and continuous features, concatenated at the final layer.

Convert embedding vectors into discrete features for cross‑feature interaction.

Use top‑K embedding IDs as discrete features.

Recall examples:

Interest recall by similarity of user‑item embeddings.

Cross‑domain interest recall using SimSvr nearest‑neighbor search between ad and public‑account embeddings.

Author cold‑start recall by matching user interests with newly‑published author embeddings.

6. Case Study 1 – Cross‑Domain Interest Recall GNN

In early live‑streaming, many new users have little interaction with live rooms but rich activity in other domains (public accounts, short videos). A heterogeneous graph was built with user, live‑room, and item nodes, and edges representing watch behavior, friendships, and side‑item interactions. Multi‑path metapaths were designed for both user and item sides, aggregating information from behavior, social ties, and static attributes. After multi‑path graph convolutions, embeddings were pooled, passed through dense layers, and used to compute a U‑I loss for recommendation.

To mitigate over‑fitting to the live‑stream domain, side‑item information from other domains was added to the graph, and a reconstruction loss weighted by learnable parameters (inspired by multi‑task learning) was introduced, allowing supervision from external behaviors.

7. Case Study 2 – Multi‑Task GNN Feature Extraction

For advertising recommendation in subscription‑account feeds, ad samples are extremely sparse while user reading behavior is abundant. An heterogeneous graph linking users, ads, and public‑account articles was constructed. User‑side metapaths captured ad interactions, friend behavior, and cross‑domain article consumption. Item‑side metapaths aggregated similar ads and collaborative signals from user behavior.

After graph convolutions, user and item embeddings were concatenated, fed into dense layers, and used for similarity‑based UI loss. Challenges identified included sparse ad behavior and seasonal ad placement, leading to distribution gaps between ad and public‑account domains.

Optimizations applied:

Share base convolution parameters while splitting user features into two dense branches to produce shared and private embeddings.

Apply MMD loss to align the distributions of shared embeddings across domains.

Incorporate cross‑domain samples from public accounts into training, improving both offline and online performance.

Overall, the presented GNN‑based pretraining frameworks demonstrate how self‑supervised learning on large‑scale graph data can bridge data silos, enrich user/item representations, and significantly boost recommendation quality.

Thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Recommendation Systems self-supervised learning Graph Neural Networks Multi-domain pretraining models

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.