Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising
SMAD is a scalable graph‑based ad retrieval framework for e‑commerce search that builds a heterogeneous Query‑Item‑Ad graph, learns multi‑view embeddings with a parallel deep neural network and attention, employs category‑aware sampling for efficient distributed training, and delivers significant gains in offline relevance and online CTR, RPM, and PVR.
Graph models can effectively capture relational information in data to enhance representations, and are widely used in both research and industry. Alibaba’s advertising team open‑sourced the large‑scale distributed deep graph learning platform Euler in 2019, which has been adopted extensively (GitHub ★2.7K). Building on Euler, the authors iterated multiple algorithm modules and present a scalable multi‑view ad matching engine (SMAD) for the ad retrieval stage of e‑commerce search.
In e‑commerce search, a query retrieves both ads and organic items. To balance efficiency and effectiveness, a two‑stage pipeline is commonly used: a lightweight ad retrieval module followed by a more complex ranking module. SMAD replaces the retrieval module with a graph‑embedding approach: a heterogeneous Query‑Item‑Ad graph is constructed from user behavior, click, and textual similarity edges, and node representations are learned via a parallel deep neural network (PNN). Approximate nearest neighbor (ANN) search then retrieves relevant ads.
The key innovations are:
Category‑tree‑constrained graph sampling and partitioning: nodes are assigned to leaf categories, enabling sub‑graph extraction that respects user intent and reduces communication/computation costs.
A Parallel Deep Neural Network (PNN) that learns separate DNN branches for each view (click, text similarity, co‑bid) and fuses them with an attention mechanism.
An efficient distributed deployment that splits the massive graph into independent sub‑graphs, allowing parallel training on dozens of machines.
SMAD’s training pipeline uses Euler for storage and training, Faiss‑style ANN for retrieval, and standard deep‑learning settings (Adam optimizer, batch size 512, learning rate 0.001) on a 50‑node Alibaba Cloud cluster.
Extensive offline experiments on a manually labeled query‑ad dataset (20k query‑ad and 20k item‑ad pairs) show that SMAD outperforms baselines such as SimRank++, BKR, DSSM, Search2vec, and several SMAD variants (random walk, no‑attribute, no‑attention) in macro‑NDCG. Online A/B tests in Taobao’s ad retrieval system demonstrate significant gains in CTR (+5% to +5.5%), RPM, and PVR compared with the same baselines.
The study concludes that modeling massive e‑commerce advertising data as a multi‑view heterogeneous graph, combined with category‑aware sampling and a parallel attention‑based DNN, yields substantial improvements in both offline relevance metrics and online revenue.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.