How Mixed‑Curvature Graph Embeddings Boost E‑commerce Ad Retrieval

This article presents AMCAD, an adaptive mixed‑curvature graph embedding system that models heterogeneous e‑commerce search ad graphs in non‑Euclidean spaces, detailing its sample construction, three‑stage model architecture, offline and online experiments, and demonstrating significant performance gains over Euclidean baselines.

Alimama Tech
Alimama Tech
Alimama Tech
How Mixed‑Curvature Graph Embeddings Boost E‑commerce Ad Retrieval

Abstract

Graph embedding traditionally operates in Euclidean space, which limits its ability to represent hierarchical (tree‑like) and cyclic (ring‑like) structures common in large‑scale heterogeneous graphs. AMCAD (Adaptive Mixed‑Curvature Representation for Advertisement Retrieval) addresses this limitation by encoding nodes into a Cartesian product of multiple constant‑curvature spaces (e.g., spherical, hyperbolic, Euclidean) and learning the optimal curvature configuration end‑to‑end. The system is built on the open‑source CurvLearn framework ( https://github.com/alibaba/Curvature-Learning-Framework) and has been deployed in Alibaba’s Taobao search advertising pipeline.

Background

Taobao search advertising graphs contain three node types (Query, Item, Ad) and several edge types (Click, Co‑click, Semantic, Co‑bidding). Queries exhibit strong hierarchical relations (category trees), while Items/Ads form dense co‑click clusters with pronounced cyclic patterns. Modeling such mixed topology with a single Euclidean embedding leads to significant precision loss.

Example of Taobao search ad graph with tree and ring structures
Example of Taobao search ad graph with tree and ring structures

System Design

Sample Construction

For each user session, a query followed by a sequence of clicks generates a set of nodes and edges:

Click edges connect the query to each clicked Item/Ad.

Co‑click edges connect Items/Ads that appear together in the same session.

Semantic edges are added between queries with high textual similarity.

Co‑bidding edges link Ads that share bidding keywords.

The resulting graph contains three node types and multiple heterogeneous edge types. Training samples consist of one positive node pair sampled via predefined metapath walks and several negative pairs generated by random walks (both simple and hard negatives).

Construction process of Taobao search ad graph
Construction process of Taobao search ad graph

Model Design

Each node of type T with d raw features is first embedded into a dense Euclidean vector v_i. The vector is then mapped to a constant‑curvature space with curvature c_i using an exponential map: z_i = exp_map(v_i, c_i) The set {z_i} for all features forms a Cartesian product of mixed‑curvature spaces. A graph neural network (GNN) aggregates neighbor information in this product space. For a node u at GNN layer l with neighbors N(u):

h_u^{(l)} = σ\Big(\sum_{v\in N(u)} W^{(l)} \cdot z_v^{(l-1)}\Big)

A curvature‑fusion module then combines the representations from different curvature components using attention weights α_i that are learned per edge type: h_u^{fusion} = \sum_i α_i \cdot z_{u,i} The model consists of three stages:

Node‑level Adaptive Mixed‑Curvature Encoder (learns optimal curvatures for each feature).

Edge‑level Space Projection (maps node pairs of a given edge type into a shared mixed‑curvature space).

Subspace‑distance Combination with attention‑weighted curvature mixing (produces a final similarity score).

The similarity between a query node q and an item/ad node a is computed as a weighted sum of Riemannian distances across subspaces:

sim(q,a) = \sum_i α_i \cdot d_{c_i}(z_q^{(i)}, z_a^{(i)})

Training minimizes a triplet loss:

L = \max\big(0, m + sim(q, a_{neg}) - sim(q, a_{pos})\big)

where m is the margin. Optimization uses a Riemannian SGD variant provided by CurvLearn.

Online Retrieval

Fast nearest‑neighbor search in mixed‑curvature space is enabled by CurvFaiss, an extension of FAISS that supports Riemannian metrics. The online architecture uses a two‑layer inverted index:

Layer 1 expands the user trigger (current query + recent clicked items) into a richer set of query and item signals.

Layer 2 retrieves candidate ads from the mixed‑curvature index based on the expanded signals.

This design improves recall depth and result diversity while keeping latency within production constraints.

Two‑layer online retrieval architecture
Two‑layer online retrieval architecture

Experimental Evaluation

Offline Experiments

Using one day of Taobao search logs, AMCAD was compared against three baseline families:

Pure Euclidean GNN.

Single‑curvature GNN (either spherical or hyperbolic).

Existing mixed‑curvature methods.

Metrics included AUC, HitRate, and nDCG. AMCAD achieved average improvements of >20 % over the Euclidean baseline and outperformed both single‑curvature and prior mixed‑curvature approaches.

Offline experiment results
Offline experiment results

An ablation study confirmed the contribution of each module; the node‑level mixed‑curvature encoder provided the largest gain.

Ablation results
Ablation results

Online Experiments

Latency tests showed that the two‑layer retrieval system scales to ten times the baseline query‑per‑second load while latency grows by less than a factor of two, confirming suitability for billion‑scale traffic.

System latency under load
System latency under load

A seven‑day A/B test compared AMCAD against the Euclidean production system. Results showed statistically significant lifts in click‑through rate (CTR) and advertising revenue, with the most pronounced gains for high‑frequency (head) queries.

Online A/B test results
Online A/B test results
Stratified online results
Stratified online results

Conclusion and Outlook

Mixed‑curvature representations capture both hierarchical and cyclic patterns in heterogeneous graphs more effectively than pure Euclidean embeddings. Because curvature handling is model‑agnostic, the approach can be integrated into any graph‑learning architecture and is applicable beyond advertisement retrieval to other industrial graph‑based services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commerceDeep Learninggraph embeddingheterogeneous graphadvertisement retrievalmixed curvature
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.