Artificial Intelligence 17 min read

Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising

SMAD is a scalable graph‑based ad retrieval framework for e‑commerce search that builds a heterogeneous Query‑Item‑Ad graph, learns multi‑view embeddings with a parallel deep neural network and attention, employs category‑aware sampling for efficient distributed training, and delivers significant gains in offline relevance and online CTR, RPM, and PVR.

Alimama Tech

Dec 15, 2021

Scalable Multi-View Ad Retrieval (SMAD): A Graph-Based Framework for E-commerce Advertising

Graph models can effectively capture relational information in data to enhance representations, and are widely used in both research and industry. Alibaba’s advertising team open‑sourced the large‑scale distributed deep graph learning platform Euler in 2019, which has been adopted extensively (GitHub ★2.7K). Building on Euler, the authors iterated multiple algorithm modules and present a scalable multi‑view ad matching engine (SMAD) for the ad retrieval stage of e‑commerce search.

In e‑commerce search, a query retrieves both ads and organic items. To balance efficiency and effectiveness, a two‑stage pipeline is commonly used: a lightweight ad retrieval module followed by a more complex ranking module. SMAD replaces the retrieval module with a graph‑embedding approach: a heterogeneous Query‑Item‑Ad graph is constructed from user behavior, click, and textual similarity edges, and node representations are learned via a parallel deep neural network (PNN). Approximate nearest neighbor (ANN) search then retrieves relevant ads.

The key innovations are:

Category‑tree‑constrained graph sampling and partitioning: nodes are assigned to leaf categories, enabling sub‑graph extraction that respects user intent and reduces communication/computation costs.

A Parallel Deep Neural Network (PNN) that learns separate DNN branches for each view (click, text similarity, co‑bid) and fuses them with an attention mechanism.

An efficient distributed deployment that splits the massive graph into independent sub‑graphs, allowing parallel training on dozens of machines.

SMAD’s training pipeline uses Euler for storage and training, Faiss‑style ANN for retrieval, and standard deep‑learning settings (Adam optimizer, batch size 512, learning rate 0.001) on a 50‑node Alibaba Cloud cluster.

Extensive offline experiments on a manually labeled query‑ad dataset (20k query‑ad and 20k item‑ad pairs) show that SMAD outperforms baselines such as SimRank++, BKR, DSSM, Search2vec, and several SMAD variants (random walk, no‑attribute, no‑attention) in macro‑NDCG. Online A/B tests in Taobao’s ad retrieval system demonstrate significant gains in CTR (+5% to +5.5%), RPM, and PVR compared with the same baselines.

The study concludes that modeling massive e‑commerce advertising data as a multi‑view heterogeneous graph, combined with category‑aware sampling and a parallel attention‑based DNN, yields substantial improvements in both offline relevance metrics and online revenue.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

attention multi-view distributed training ad retrieval graph learning

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.