Artificial Intelligence 13 min read

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

Kuaishou has had eight cutting‑edge papers accepted at the International Conference on Machine Learning 2025, covering breakthroughs in multimodal emotion modeling, monotonic probability learning, causal effect generalization, cascade ranking, multimodal LLM alignment, ultra‑low‑rate image compression, and visual autoregressive super‑resolution, with links to each work and accompanying code repositories.

Kuaishou Tech

Jul 7, 2025

8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More

Kuaishou announced that eight high‑impact papers were selected for the 42nd International Conference on Machine Learning (ICML 2025), highlighting the company’s continued innovation in artificial intelligence.

Paper 01: MODA – Modular Duplex Attention for Understanding Multimodal Perception, Cognition, and Emotion

Link: https://openreview.net/pdf?id=9hd5WA6QCn

The authors introduce a modular duplex attention mechanism to build a multimodal large model (MODA) that integrates perception, cognition, and emotion capabilities, achieving significant performance gains across 21 benchmarks in six task categories and earning a Spotlight (top 2.6%).

Paper 02: Learning Monotonic Probabilities with a Generative Cost Model

Link: https://arxiv.org/pdf/2506.03542

A generative cost model (GCM) is proposed to reformulate strict monotonic probability learning as a partial‑order problem between observable reward and latent cost variables, with an implicit variant (IGCM) for hidden monotonicity; experiments on synthetic and public datasets show superior performance over existing monotonic modeling techniques.

Paper 03: Generalizing Treatment Effects from Randomized Controlled Trials across Environments

The paper presents a Two‑Stage Doubly Robust (2SDR) estimator that relaxes separating‑set assumptions, enabling unbiased causal effect generalization when the set is observable in either source or target environments, with theoretical guarantees and extensive empirical validation.

Paper 04: Learning Cascade Ranking as One Network

Link: https://arxiv.org/abs/2503.09492

LCRON introduces a new surrogate loss that directly optimizes the lower bound of ground‑truth item survival probability across cascade ranking stages, aligning training objectives with system‑level goals and delivering significant improvements in both benchmark and industrial settings.

Paper 05: MM‑RLHF – The Next Step Forward in Multimodal LLM Alignment

Link: https://arxiv.org/abs/2502.10391

The authors release the MM‑RLHF dataset (120k human‑annotated preference pairs) and propose a critique‑based reward model and dynamic reward scaling, achieving notable gains in dialogue quality (19.5%) and safety (60%) for multimodal large language models.

Paper 06: Orthus – Autoregressive Interleaved Image‑Text Generation with Modality‑Specific Heads

Link: https://arxiv.org/abs/2412.00127

Orthus combines a differentiable visual embedding module with a unified autoregressive Transformer and separate modality‑specific heads, overcoming information loss in vector‑quantized models and noise in diffusion‑based hybrids, surpassing SOTA on visual understanding and image‑text generation tasks.

Paper 07: Ultra Low‑rate Image Compression with Semantic Residual Coding and Compression‑aware Diffusion

Link: https://arxiv.org/abs/2505.08281

ResULIC introduces semantic residual coding and a compression‑aware diffusion model, achieving ultra‑low‑bit rates with high fidelity and outperforming existing SOTA methods in both objective metrics and visual quality.

Paper 08: VARSR – Visual Autoregressive Modeling for Image Super‑Resolution

Link: https://arxiv.org/abs/2501.18993

VARSR proposes a next‑scale prediction framework with prefix tokens, scale‑aligned rotary positional encoding, and a diffusion refiner, delivering superior realism‑fidelity trade‑offs and efficiency compared to diffusion‑based super‑resolution approaches.

These works collectively demonstrate Kuaishou’s breadth in AI research, spanning multimodal understanding, causal inference, ranking systems, and advanced image generation techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning AI ranking multimodal causal inference Super-Resolution

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.