Artificial Intelligence 12 min read

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)

PCF‑GNN builds a heterogeneous graph of feature nodes and learns edge statistics via pre‑training, enabling it to infer unseen cross‑features, reduce storage by over 50%, and consistently improve CTR prediction AUC compared to implicit and explicit baselines, with proven online gains.

Alimama Tech

May 27, 2021

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)

In click‑through‑rate (CTR) prediction, modeling the interaction between features is a key factor for improving model performance. For example, a user whose occupation is a basketball player may frequently click on "Nike‑Air Jordan" items, while a programmer may prefer digital products; the interaction between thus provides a strong signal for CTR estimation.

Existing methods for cross‑feature modeling fall into two categories: implicit semantic modeling (e.g., Wide&Deep, DeepFM, DCN) that learns interactions through network structures, and explicit semantic modeling that uses cross‑statistical features to directly capture historical interaction frequencies.

Explicit semantic modeling faces two major challenges: (1) poor generalization because cross‑statistical features cannot infer values for unseen feature pairs, and (2) large storage overhead due to the need to maintain a mapping table of pairs, which grows with the Cartesian product of feature vocabularies.

To address these issues, the paper proposes PCF‑GNN (Pre‑trained Graph Neural Network for Cross‑Feature learning). Features are represented as nodes in a heterogeneous graph, edges denote historical co‑occurrences, and edge attributes store the cross‑statistical values. By pre‑training a GNN to predict edge attributes, the model can infer statistics for unseen pairs (improving generalization) and eliminates the need to store the full mapping table (reducing storage).

Pre‑training stage : The graph is built from historical click data (e.g., and edges). Edge attributes are computed as the co‑occurrence count divided by the product of the two feature vocabularies. A pre‑training task predicts these edge attributes. The network consists of a Node Encoding Module (a multi‑relation GraphSAGE that encodes node neighborhoods) and a CrossNet that maps node embeddings to edge space, optionally using a simple dot product or a shallow neural layer. A weighted squared loss is employed, where each edge’s loss is weighted by its co‑occurrence frequency (with smoothing to avoid zero division).

Downstream stage : The pre‑trained PCF‑GNN provides estimated edge attributes that are concatenated with the embeddings of a standard Embedding&MLP CTR model. Two deployment strategies are described: (a) fixing the node‑encoding parameters and only using the pre‑computed node embeddings to save inference cost, and (b) fine‑tuning the entire PCF‑GNN together with the CTR model.

Experiments : Evaluations are conducted on public MovieLens data and an internal Alibaba dataset, using AUC as the metric. Baselines include implicit models (Wide&Deep, DeepFM, AutoInt, FiBiNet), a GNN‑based implicit model (Fi‑GNN), and graph pre‑training models (GraphSAGE, PGCN). PCF‑GNN consistently achieves higher AUC than all baselines.

Generalization evaluation : A test set containing new cross‑feature pairs shows that PCF‑GNN attains higher coverage and better AUC improvement over the original DNN, demonstrating superior generalization.

Storage evaluation : Compared with storing raw cross‑statistical features, PCF‑GNN reduces storage by more than 50% because only node embeddings (whose size is linear in the number of features) need to be kept. Further compression can be achieved with hash embeddings.

Online A/B test : Deploying PCF‑GNN in a production CTR model yields measurable lift in key business metrics, confirming its practical effectiveness.

Conclusion : By leveraging a pre‑trained graph neural network, PCF‑GNN captures explicit cross‑semantic information while dramatically lowering storage costs, offering a valuable approach for large‑scale recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Recommendation Systems pretraining Graph Neural Network cross feature explicit feature learning

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.