13 KDD'26 Papers from Taobao: Scaling Laws, World Models and New AI Paradigms
The article highlights thirteen Taobao‑group papers accepted at KDD 2026, covering large‑model scaling laws, end‑to‑end generative recommendation, CTR prediction, interactive recommendation agents, LLM‑based pricing, robust auto‑bidding, two‑stage auctions, generative world models, multi‑attribution conversion, uplift modeling and long‑term causal estimation for e‑commerce systems.
Overview
KDD 2026, the premier ACM SIGKDD conference on knowledge discovery and data mining, will be held in Jeju, South Korea (August 9‑13). Among the four core tracks—Research, Applied Data Science (ADS), Dataset & Benchmark, and AI for Science—the Taobao group has had more than a dozen papers accepted, showcasing how large‑model, reinforcement‑learning and world‑model techniques are reshaping e‑commerce intelligence.
1. Architecture Innovation for Large‑Model E‑commerce
TaoSR1: E‑commerce Relevance Search with Deep Reasoning and Low‑Latency Deployment
Track: ADS<br/> Paper link: https://arxiv.org/abs/2508.12365
Pain point: Traditional BERT lacks complex reasoning; existing LLM solutions rely on knowledge distillation and cannot be deployed online.
Solution: The TaoSR1 framework introduces a three‑stage optimization paradigm: (1) CoT‑guided supervised fine‑tuning to boost deep reasoning; (2) Pass@N‑based offline sampling combined with Direct Preference Optimization (DPO) to improve generation quality; (3) Difficulty‑aware dynamic sampling merged with Group‑Relative Policy Optimization (GRPO) to mitigate hallucination. Two novel deployment tricks—post‑CoT processing and cumulative‑probability‑based relevance tiering—enable efficient online inference.
Results: TaoSR1 outperforms strong baselines on challenging offline benchmarks and achieves large gains in side‑by‑side human blind tests, providing an industrial‑grade RL‑aligned CoT inference pattern for classification tasks.
AIGQ: End‑to‑End Hybrid Generative Architecture for E‑commerce Query Recommendation
Track: ADS<br/> Paper link: https://arxiv.org/abs/2603.19710
Pain point: Traditional query recommendation suffers from shallow semantics, cold‑start, and lack of novelty.
Solution: AIGQ introduces three innovations: (1) Interest‑aware List‑level Supervised Fine‑Tuning (IL‑SFT) that aggregates session‑level behavior and re‑ranks with fine‑grained intent modeling; (2) Interest‑aware List‑level GRPO (IL‑GRPO) with a dual‑reward mechanism that jointly optimizes per‑query relevance and list‑level properties, using online CTR as a reward signal; (3) A hybrid offline‑online architecture where AIGQ‑Direct generates personalized candidates and AIGQ‑Think provides reasoning, satisfying millisecond‑level latency.
Results: Offline, AIGQ surpasses vector retrieval, Qwen‑3‑30B, Gemini 3 Pro and GPT‑5.1 on Cate HR@30 and Query HR@30. In a 30‑day online A/B test, it lifts transaction count (+10.31 %), GMV (+10.68 %), UCTR (+7.42 %) and 7‑day retention (+3.73 %).
2. Scaling Laws and Unified Modeling for CTR Prediction
FAT: From Scaling Law to Structured Expressivity for Industrial CTR
Track: ADS<br/> Paper link: https://arxiv.org/abs/2511.12081
Pain point: GPU capacity grows but CTR models see diminishing returns; fragmented transformer structures waste compute and cannot exploit heterogeneous feature fields.
Solution: FAT redesigns two components: a field‑aware attention that assigns each field its own projection matrix and lightweight gating to control cross‑field flow; and a base‑combination hyper‑network that synthesizes field‑specific parameters from shared bases, cutting parameter count by >99 % without extra inference cost. New fields are added via a light routing module, avoiding full retraining.
Results: Offline AUC improves significantly; scaling to 2 B parameters still yields gains. Deployed in Alibaba‑Mama, FAT contributes +8 % CTR and raises MFU from single digits to >30 %.
EST: Efficiently Scalable Transformer for CTR
Track: ADS<br/> Paper link: https://arxiv.org/pdf/2602.10811
Pain point: CTR models must scale like LLMs but face strict latency and compute limits; existing methods aggregate behavior early, losing global modeling power.
Solution: EST introduces (1) Lightweight Cross‑Attention (LCA) that treats non‑behavior features as queries to attend only to the most relevant behavior tokens; (2) Content Sparse Attention (CSA) that leverages similarity priors of multimodal content to sparsify attention. This reduces redundant self‑attention while preserving high‑value interactions.
Results: EST exhibits power‑law scaling on real Taobao promotion data, delivering +1.22 % CTR and +3.27 % RPM in online A/B tests.
3. AI‑Driven Next‑Generation Commercial Decision Systems
RecBot: Interactive Recommendation Flow with Dual Agents
Track: Research<br/> Paper link: https://arxiv.org/abs/2509.21317
Pain point: Conventional recommendation relies on passive signals (likes, clicks) and cannot capture true user intent, leading to preference distortion and filter bubbles.
Solution: RecBot proposes an Interactive Recommendation Flow (IRF) that lets users issue natural‑language commands to steer recommendations. It consists of a Parser agent that converts commands into structured feedback and constraints, and a Planner agent that composes tool chains to execute the intent. A teacher‑student distillation transfers strong model capabilities to a lightweight student for industrial deployment.
Results: RecBot achieves the best performance on single‑round, multi‑round, and drift scenarios in both public and private datasets. A three‑month online A/B test shows a 0.71 % reduction in negative feedback, 1.44 % increase in click‑category diversity, and 1.40 % GMV uplift.
AIGP: LLM‑Based Long‑Term Value Alignment for E‑commerce Pricing
Track: ADS<br/> Paper link: https://arxiv.org/abs/2605.XXXXX (placeholder)
Pain point: Large‑scale pricing models are black‑box and ignore unstructured product text; generic LLMs lack domain knowledge and long‑term reward supervision.
Solution: AIGP builds an LLM agent that generates explainable discount decisions via chain‑of‑thought reasoning (“clear”) and aligns them with long‑term value using offline RL‑driven Long‑Term Value Estimator (LTVE) and DPO preference alignment (“far”).
Results: Offline and online experiments show AIGP improves 14‑day GMV by +13 %, ROI by +8 %, and stabilizes pricing, especially in cold‑start scenarios.
LOGIC: Goal‑Guided Generative Auto‑Bidding
Track: Research<br/> Paper link: https://arxiv.org/abs/2507.XXXXX (placeholder)
Pain point: Auto‑bidding suffers from offline distribution bias and lacks proactive exploration; generative methods mimic behavior and can collapse on out‑of‑distribution actions.
Solution: LOGIC shifts focus from fragile action space to robust Return‑to‑Goal (RTG) space. An integrated critic actively guides high‑value goal generation, while a backward‑consistency‑regularization (BCR) module checks mathematical consistency of RTG sequences. This yields proactive, stable bidding strategies.
Results: On the AuctionNet benchmark and online A/B tests, LOGIC scores 38.3 (vs. GAVE 37.2) and improves ROI and spend efficiency.
Two‑Stage Auctions with Bid Refinement
Track: Research<br/> Paper link: https://arxiv.org/abs/2506.XXXXX (placeholder)
Pain point: Standard two‑stage auctions collect a single bid per candidate, preventing bid updates that reflect real‑time quality predictions, and risk strategic over‑bidding in the first stage.
Solution: The authors propose a bid‑refinement mechanism with an entry‑fee to reflect the value of reaching the second stage, and a discounted dynamic auction that balances incentive compatibility with individual rationality.
Results: The mechanism achieves near‑incentive‑compatibility while improving overall auction efficiency, providing a practical design for industrial deployment.
Physics‑Informed Generative World Models for Real‑Time Bidding
Track: Research<br/> Paper link: https://arxiv.org/abs/2504.XXXXX (placeholder)
Pain point: Existing RTB simulators assume deterministic point predictions and cannot capture heteroscedastic variance or structural coupling of feedback variables.
Solution: The paper derives Poisson‑lognormal and Tweedie‑lognormal laws for bidding feedback, introduces a zero‑inflated generalized Beta‑II distribution (ZI‑GB2) for efficient sampling, and employs a normalized‑flow Copula to model non‑Gaussian cost‑value coupling.
Results: On large production data, ZI‑GB2 reduces SMAPE and CRPS, while the Copula achieves lower NLL than Gaussian, Student‑t, or Gumbel alternatives, confirming that physical priors unlock large‑model potential.
4. Multi‑Dimensional Causal Inference and Incremental Optimization
MAC & PyMAL: Multi‑Attribution CVR Benchmark and Library
Track: Dataset and Benchmark<br/> Open‑source: https://github.com/alimama-tech/PyMAL
Pain point: Public CVR datasets provide only a single attribution label, ignoring the complex conversion journey.
Solution: MAC releases click‑level data with Last‑click, First‑click, Linear and DDA labels, and PyMAL implements baseline algorithms for fair comparison. The MoAE model uses a Mixture‑of‑Experts to learn multi‑attribution knowledge with asymmetric knowledge transfer to the primary CVR task.
Results: MoAE improves GAUC by up to 0.39 pp across all attribution mechanisms; auxiliary attribution tasks boost performance especially for long conversion paths.
CanniUplift: Global Alignment and Redemption‑Based Denoising for Incentive Modeling
Track: ADS
Pain point: Traditional uplift assumes independent incentive effects, leading to over‑estimation when multiple sellers compete.
Solution: The framework aligns seller‑level uplift with platform‑wide GMV changes and decomposes redemption behavior to separate genuine incentive‑driven conversions from natural demand.
Results: Outperforms strong baselines on platform‑level revenue alignment, seller‑level AUUC/QINI, and online A/B tests.
M‑DLRI: Scalable Joint Uplift Modeling for Multi‑Lever Marketing
Track: ADS
Pain point: Jointly optimizing discounts, subsidies and commission rates creates combinatorial explosion and data sparsity.
Solution: M‑DLRI introduces a multi‑lever encoding layer, CP low‑rank factorization, and log‑additive gating, together with monotonicity constraints to ensure incentives never reduce response.
Results: On synthetic and real Alibaba‑Mama data, Root‑PEHE drops from 1.145 (XTNet) to 0.854; online A/B shows +0.86 % ROI lift.
STEAL: Short‑Term Experiments for Long‑Term Causal Effect Estimation
Track: Research
Pain point: Long‑term metrics are unobservable in short‑term A/B tests, and distribution shift between historical and online data hampers estimation.
Solution: STEAL generates pseudo‑policy labels from heterogeneous short‑term behavior, separates causal effect of policy on short‑term outcomes, and applies an interaction attention mechanism to model heterogeneous long‑term effects.
Results: In simulations, STEAL markedly reduces PEHE; in real A/B tests it outperforms baselines, delivering more reliable long‑term impact estimates.
These thirteen papers collectively illustrate how scaling laws, generative world models, causal inference and multi‑objective optimization are forging new paradigms for intelligent e‑commerce systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
