Alibaba Mama Team Papers Selected for The Web Conference 2023 – Summaries of Five AI Research Works
The Alibaba Mama technical team secured five paper acceptances at The Web Conference 2023, presenting advances in unbiased delayed‑feedback conversion modeling, uncertainty‑regularized knowledge‑distilled CVR debiasing, feature‑aware probability calibration, coordinated two‑stage ad auctions, and scalable decoupled graph neural networks for large‑scale e‑commerce retrieval.
Recently, the review results of the 31st International World Wide Web Conference (The Web Conference / WWW) were announced, and the Alibaba Mama technical team has five papers accepted.
The Web Conference, founded in 1989 and renamed in 2018, is a top-tier academic venue for network systems and applications. This year’s conference will be held online from April 25‑29 in Lyon, France, receiving 1,822 long‑paper submissions with a 17.7% acceptance rate.
1. Asymptotically Unbiased Estimation for Delayed Feedback Modeling via Label Correction (DEFUSE / Bi‑DEFUSE)
In advertising scenarios, CVR (conversion rate) prediction serves both ranking and bidding strategies. Online learning introduces significant delayed feedback, making CVR estimation challenging. Existing methods use importance sampling but mistakenly treat false‑negative samples as true negatives, causing bias. The proposed DEFUSE method classifies observed samples into four types and refines importance sampling at a finer granularity, achieving unbiased delayed‑feedback modeling. For short attribution windows, Bi‑DEFUSE splits CVR prediction into two sub‑tasks with multi‑task learning. Offline experiments show superior performance, and the method is now deployed in Alibaba Mama’s search advertising.
2. UKD: Debiasing Conversion Rate Estimation via Uncertainty‑regularized Knowledge Distillation
Traditional CVR models suffer from sample selection bias because they are trained on click space but serve in exposure space. UKD extracts knowledge from unclicked samples using a teacher model that generates pseudo‑conversion labels, and a student model that learns from both clicked and unclicked data with uncertainty regularization to mitigate label noise. Experiments on multiple advertising scenarios demonstrate significant improvements over prior debiasing methods, with notable gains in CVR and CPA metrics in online tests.
3. MBCT: Tree‑Based Feature‑Aware Binning for Individual Uncertainty Calibration
Probability estimation (e.g., CTR, CVR) is critical in advertising. Existing calibration methods either ignore feature information or apply the same correction to all samples in a bin. MBCT introduces a Feature‑Aware Binning framework using multiple boosting calibration trees that learn error patterns in feature space, enabling per‑sample calibration. A new metric, MVCE (Multi‑view Calibration Error), is proposed for comprehensive evaluation. Experiments on three datasets and deployment in Alibaba Mama’s display advertising show clear advantages in calibration error and ranking performance.
4. On Designing a Two‑stage Auction for Online Advertising
Industrial online advertising systems often use a two‑stage auction: a lightweight pre‑auction followed by a more expensive final auction. The paper studies the interaction between the two stages and proposes a coordinated two‑stage auction that links pre‑auction selection scores (PAS) with a generalized second‑price mechanism in the final stage, preserving incentive compatibility. An approximate solution for the NP‑hard pre‑auction selection problem is presented, and experiments on public and industrial datasets show significant improvements in social welfare and revenue.
5. DC‑GNN: Decoupled Graph Neural Networks for Improving and Accelerating Large‑Scale E‑commerce Retrieval
Graph Neural Networks (GNNs) are powerful for large‑scale ad retrieval but suffer from low training efficiency due to billions of nodes and edges. DC‑GNN decouples graph pre‑training, deep aggregation, and a dual‑tower CTR prediction stage. Pre‑training combines supervised link prediction with self‑supervised multi‑view contrastive learning. Deep aggregation uses heterogeneous linear diffusion operators to capture high‑order structures efficiently. The decoupled design makes training complexity independent of graph size, achieving both higher model performance and faster training on industrial datasets.
Paper downloads: DEFUSE / Bi‑DEFUSE: https://arxiv.org/abs/2201.08024 Two‑stage auction: https://arxiv.org/abs/2111.05555
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.