8 Kuaishou Papers Spotlighted at ICML 2025: Multimodal AI, Causal Inference and More
Kuaishou has had eight cutting‑edge papers accepted at the International Conference on Machine Learning 2025, covering breakthroughs in multimodal emotion modeling, monotonic probability learning, causal effect generalization, cascade ranking, multimodal LLM alignment, ultra‑low‑rate image compression, and visual autoregressive super‑resolution, with links to each work and accompanying code repositories.
Kuaishou announced that eight high‑impact papers were selected for the 42nd International Conference on Machine Learning (ICML 2025), highlighting the company’s continued innovation in artificial intelligence.
Paper 01: MODA – Modular Duplex Attention for Understanding Multimodal Perception, Cognition, and Emotion
Link: https://openreview.net/pdf?id=9hd5WA6QCn
The authors introduce a modular duplex attention mechanism to build a multimodal large model (MODA) that integrates perception, cognition, and emotion capabilities, achieving significant performance gains across 21 benchmarks in six task categories and earning a Spotlight (top 2.6%).
Paper 02: Learning Monotonic Probabilities with a Generative Cost Model
Link: https://arxiv.org/pdf/2506.03542
A generative cost model (GCM) is proposed to reformulate strict monotonic probability learning as a partial‑order problem between observable reward and latent cost variables, with an implicit variant (IGCM) for hidden monotonicity; experiments on synthetic and public datasets show superior performance over existing monotonic modeling techniques.
Paper 03: Generalizing Treatment Effects from Randomized Controlled Trials across Environments
The paper presents a Two‑Stage Doubly Robust (2SDR) estimator that relaxes separating‑set assumptions, enabling unbiased causal effect generalization when the set is observable in either source or target environments, with theoretical guarantees and extensive empirical validation.
Paper 04: Learning Cascade Ranking as One Network
Link: https://arxiv.org/abs/2503.09492
LCRON introduces a new surrogate loss that directly optimizes the lower bound of ground‑truth item survival probability across cascade ranking stages, aligning training objectives with system‑level goals and delivering significant improvements in both benchmark and industrial settings.
Paper 05: MM‑RLHF – The Next Step Forward in Multimodal LLM Alignment
Link: https://arxiv.org/abs/2502.10391
The authors release the MM‑RLHF dataset (120k human‑annotated preference pairs) and propose a critique‑based reward model and dynamic reward scaling, achieving notable gains in dialogue quality (19.5%) and safety (60%) for multimodal large language models.
Paper 06: Orthus – Autoregressive Interleaved Image‑Text Generation with Modality‑Specific Heads
Link: https://arxiv.org/abs/2412.00127
Orthus combines a differentiable visual embedding module with a unified autoregressive Transformer and separate modality‑specific heads, overcoming information loss in vector‑quantized models and noise in diffusion‑based hybrids, surpassing SOTA on visual understanding and image‑text generation tasks.
Paper 07: Ultra Low‑rate Image Compression with Semantic Residual Coding and Compression‑aware Diffusion
Link: https://arxiv.org/abs/2505.08281
ResULIC introduces semantic residual coding and a compression‑aware diffusion model, achieving ultra‑low‑bit rates with high fidelity and outperforming existing SOTA methods in both objective metrics and visual quality.
Paper 08: VARSR – Visual Autoregressive Modeling for Image Super‑Resolution
Link: https://arxiv.org/abs/2501.18993
VARSR proposes a next‑scale prediction framework with prefix tokens, scale‑aligned rotary positional encoding, and a diffusion refiner, delivering superior realism‑fidelity trade‑offs and efficiency compared to diffusion‑based super‑resolution approaches.
These works collectively demonstrate Kuaishou’s breadth in AI research, spanning multimodal understanding, causal inference, ranking systems, and advanced image generation techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
