ICLR 2026: Kuaishou Tech Team’s Cutting‑Edge AI Research Highlights

This article reviews eight Kuaishou‑authored papers accepted at ICLR 2026, summarizing their problem statements, novel methods such as front‑door causal attribution, visual table retrieval, denoising rerankers, difficulty‑adaptive reasoning, diffusion code infilling, generative ordinal regression, multimodal video retrieval, e‑commerce dialogue benchmarks, and a new LLM creativity evaluator, together with reported experimental gains.

Kuaishou Tech

Apr 24, 2026

ALM-MTA: Front‑Door Causal Multi‑Touch Attribution for Creator‑Ecosystem Optimization

Large‑scale recommendation systems lack precise labels and contain unobserved confounders, making back‑door adjustment ineffective for multi‑touch attribution. ALM‑MTA introduces an adversarially learned mediator that serves as a proxy for the outcome, enabling front‑door identification. A contrastive learning module constrains the marginalized front‑door probability on tightly matched “consumption‑post” sample pairs, addressing positivity violations in massive intervention spaces. Evaluation uses a non‑RCT bucket protocol that estimates uplift and computes AUUC at the intervention‑cluster level. In a production system with 4 billion daily active users and 300 billion samples, ALM‑MTA yields a 0.04 % increase in DAU, a 0.6 % rise in daily active creators, and a 670 % boost in exposure efficiency. AUUC improves up to 0.070 over the previous state‑of‑the‑art across all propensity buckets, and post‑prediction AUC rises by 40 %.

Paper: https://openreview.net/pdf?id=3r68a6GOpg

Project: https://github.com/logwhistle/ALM-MTA

TaR‑ViR: Multimodal Table Retrieval in the Open World

Traditional table retrieval flattens tables into linear text, discarding structural cues such as merged cells, irregular alignments, and embedded images, which degrades performance. TaR‑ViR defines a new benchmark that treats tables as images and reformulates retrieval as a multimodal task. Experiments show that removing the fragile text‑conversion step improves retrieval accuracy, demonstrating the advantage of visual representations for preserving table structure.

Paper: https://openreview.net/forum?id=4QPgqdQmYn

DNR: Denoising Neural Reranker for Recommender Systems

In two‑stage industrial recommender pipelines, recall scores from the first stage contain rich information that is under‑utilized by existing rerankers. The authors analyze scoring behaviors across stages and model the rerank problem as noise reduction on recall scores. DNR couples a denoising reranker with a noise‑generation module, decomposing the loss into three sub‑objectives: (1) denoising recall scores via sample augmentation, (2) adversarial sample exploration, and (3) aligning the generated recall‑score distribution. Extensive experiments on three public datasets and an industrial system confirm DNR’s superiority over naive baselines and existing SOTA rerankers.

Paper: https://openreview.net/pdf?id=JlwYkFm91F

DIVA‑GRPO: Difficulty‑Adaptive Variant Advantage for Multimodal Reasoning

Group‑Relative Policy Optimization (GRPO) improves multimodal large‑language‑model reasoning but suffers from sparse rewards and advantage vanishing when tasks are too easy or too hard. DIVA‑GRPO dynamically assesses problem difficulty, samples variants at appropriate difficulty levels, and computes advantages with difficulty‑weighted normalization across local (per‑problem) and global (problem‑plus‑variant) groups. Experiments on six reasoning benchmarks demonstrate faster training convergence and higher inference performance than prior methods.

Paper: https://openreview.net/pdf?id=qKXYEg00eH

DreamOn: Diffusion Language Models for Code Infilling Beyond Fixed‑Size Canvas

Diffusion Language Models (DLMs) enable flexible, non‑autoregressive generation but require a fixed‑length mask, limiting code‑infilling when the desired length differs. DreamOn introduces two length‑control states that let the model autonomously expand or shrink its output length during diffusion, requiring only a minimal modification to the training objective and no architectural changes. Built on Dream‑Coder‑7B, DreamOn matches SOTA autoregressive models on HumanEval‑Infilling and SantaCoder‑FIM benchmarks and reaches oracle‑length performance, removing a major deployment obstacle for DLMs.

Paper: https://arxiv.org/pdf/2602.01326

Project: https://github.com/DreamLM/DreamOn

GoalRank: Group‑Relative Optimization for a Large Ranking Model

The Generator‑Evaluator (G‑E) two‑stage ranking paradigm, even when extended to multiple generators (MG‑E), shows diminishing returns as candidate list size grows. The authors prove that a sufficiently large pure generator can approximate the optimal ranking strategy more closely than any finite G‑E/MG‑E system. GoalRank trains a single powerful generator using Group‑Relative Optimization (GRO): a reward model trained on real user feedback defines a reference strategy, and the generator minimizes KL divergence to this reference. Experiments on public benchmarks and a short‑video platform with over 5 billion daily active users demonstrate significant offline and online gains, including higher user dwell time and watch duration.

Paper: https://openreview.net/pdf?id=gTMzRm8fb0

GoR: A Unified Generative Framework for Ordinal Regression

Ordinal regression traditionally relies on discretization, which suffers from ambiguous boundaries and fixed bucket rigidity. GoR reframes numeric prediction as an autoregressive token‑generation task, emitting a sequence of “additive‑semantic” tokens terminated by a dynamic <EOS>. This yields interpretable, step‑wise refinement and eliminates fixed‑bucket constraints. The authors derive a bias‑variance bound for MSE and propose the Coverage–Distinctiveness Index (CoDi) to balance bias and variance when constructing token vocabularies. Evaluated on 15 benchmarks across five domains, GoR sets new SOTA, confirming the theoretical and practical advantages of the generative paradigm.

Paper: https://openreview.net/pdf?id=ys80cc2N5M

OmniCVR: A Benchmark for Omni‑Composed Video Retrieval with Vision, Audio, and Text

Existing video‑retrieval benchmarks focus on visual‑text alignment and ignore audio cues such as speech, music, and environmental sounds, which are essential for comprehensive video understanding. OmniCVR introduces a large‑scale, fully‑multimodal benchmark that fuses vision, audio, and text queries. The dataset is built via an automated pipeline that performs content‑aware segmentation, multimodal annotation, and dual verification by large‑language models and human experts. It defines three query types—visual‑only, audio‑only, and fused multimodal—with fused queries dominating. The authors also present AudioVLM2Vec, an audio‑aware model that achieves SOTA performance on OmniCVR, highlighting current limitations of multimodal retrieval systems in audio reasoning.

Paper: https://openreview.net/pdf?id=KxxR7emO5K

Mix‑Ecom: Mixed‑Type E‑Commerce Dialogues with Complex Domain Rules

Mix‑Ecom is a corpus of 4,799 real‑world customer‑service dialogues covering four dialogue types (QA, recommendation, task‑oriented, chit‑chat), three e‑commerce task categories (pre‑sale, logistics, post‑sale), and 82 domain rules. Baselines reveal that current agents struggle with mixed‑type dialogues and rule‑heavy scenarios, often hallucinating. The dataset and a proposed dynamic framework aim to benchmark and improve agent capabilities.

Paper: https://arxiv.org/pdf/2509.23836

CreataSet and CrEval: Evaluating Text Creativity Across Diverse Domains

Assessing creativity in large language models traditionally relies on costly human judgments. Existing automatic metrics lack generalization or alignment with human perception. The authors introduce a pairwise‑comparison framework that incorporates shared context instructions to improve consistency. CreataSet contains over 100 k human‑annotated samples and more than 1 M synthetic instruction‑response pairs spanning multiple creative tasks. Training an LLM‑based evaluator, CrEval, on CreataSet yields significantly higher agreement with human judgments than prior methods. Experiments also show that combining human and synthetic data is essential for robust evaluator training.

Paper: https://arxiv.org/pdf/2505.19236

Artificial Intelligence Recommendation Systems Diffusion Models Multimodal Learning Kuaishou Causal Attribution ICLR 2026 Ordinal Regression