Decoupled Graph Neural Networks for Large-Scale E-commerce Retrieval
Decoupled Graph Neural Networks (DC‑GNN) improve large‑scale e-commerce ad recall by separating graph processing from CTR prediction, using multi‑task pretraining (edge prediction + contrastive learning), efficient deep linear aggregation, and a dual‑tower CTR model, achieving higher efficiency and performance on billions‑scale data.
1. Abstract
In large‑scale advertising recall, Graph Neural Networks (GNNs) are state‑of‑the‑art due to their strong topology feature extraction and relational reasoning abilities. However, billions of items and hundreds of billions of interactions make traditional GNN‑based recall inefficient, forcing the use of shallow models that limit expressive power. To improve both training efficiency and model capacity, we propose Decoupled Graph Neural Networks (DC‑GNN), which consists of three stages: pre‑training, deep aggregation, and a dual‑tower CTR estimation. The pre‑training stage combines supervised edge prediction with self‑supervised multi‑view contrastive learning to capture node attribute information and enhance robustness. The deep aggregation stage employs heterogeneous linear graph operators to efficiently mine higher‑order structures, enriching node embeddings. Finally, the dual‑tower CTR stage uses the learned embeddings to predict scores for ad recall. By decoupling CTR estimation from graph operations, training complexity becomes independent of graph size, yielding significant gains in both efficiency and performance on industrial-scale datasets.
2. Background
Modern e‑commerce platforms rely on search, advertising, and recommendation systems to surface relevant items from billions of candidates. The recall stage must quickly retrieve a subset of relevant ads, where CTR estimation plays a crucial role. Existing GNN‑based recall solutions face two challenges: (1) the massive scale of items and edges leads to exponential growth of computation with depth, and (2) limited training efficiency forces shallow architectures, restricting the amount of neighbor information each node can aggregate.
3. Method
3.1 Graph Pre‑training – Multi‑task learning combines edge prediction (predicting whether a query and an ad are linked) with contrastive learning across multiple sub‑graph views. Two types of hard negative samples are explored: controllable k‑hop negatives and structural negatives, which encourage the GNN to focus on node attributes rather than solely on graph topology.
3.2 Deep Aggregation – Heterogeneous linear graph operators enable linear‑time propagation across many layers, allowing deep exploration of graph structure while preserving locality to mitigate over‑smoothing. Different relation sub‑graphs (query, user, ad) are aggregated with varying hop orders.
3.3 Dual‑Tower CTR Estimation – The embeddings from the first two stages feed into a dual‑tower CTR model (query‑user tower and ad tower) to produce recall scores.
4. Experimental Analysis
Experiments on a large‑scale Alibaba dataset show that DC‑GNN outperforms state‑of‑the‑art baselines in AUC and Hit‑Rate@K. Ablation studies confirm that both the edge‑prediction and contrastive‑learning tasks contribute positively, and that deeper aggregation layers improve performance up to a point while maintaining linear training cost.
5. Conclusion
DC‑GNN decouples graph processing from CTR prediction, achieving higher training efficiency and stronger expressive power for massive e‑commerce ad recall. The approach demonstrates that multi‑task pre‑training and efficient deep aggregation are key to scaling GNNs in industrial retrieval systems.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.