Insights from NeurIPS 2025: Modeling Distributions and Venturing Beyond Them
The report summarizes NeurIPS 2025 in San Diego, highlighting four NIRC papers on noise‑robust 3D human pose estimation, LVLM video‑anomaly understanding, and hand‑object reconstruction, and discusses broader industry trends such as feed‑forward generation and large‑scale pre‑training showcased by leading AI companies.
NeurIPS 2025 Overview
From December 2–7, 2025, the Conference on Neural Information Processing Systems (NeurIPS) was held in San Diego, USA. The conference received 21,575 valid submissions and accepted 5,290 papers, yielding an overall acceptance rate of 24.52%.
NIRC Contributions
The Network Intelligent Research Center (NIRC) had four papers accepted, with three presented as posters by Liu Xingyu:
Unified 2D-3D Discrete Priors for Noise‑Robust and Calibration‑Free Multiview 3D Human Pose Estimation – The UniCodebook paper addresses the lack of robustness to input noise in transformer‑based multiview 3D pose methods. It introduces a unified 2D‑3D discrete codebook (UniCodebook) and a discrete‑continuous space attention (DCSA) mechanism, achieving higher accuracy while significantly improving noise resistance without requiring camera calibration. Experiments on three mainstream datasets show performance surpassing existing methods.
Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co‑Occurrence Patterns – The authors find that large‑vision‑language models (LVLMs) are often misled by visual‑text co‑occurrence patterns learned during training, producing hallucinated anomalies in normal scenes. This work systematically diagnoses the issue and proposes VAD‑DPO, a preference‑based semantic contrast optimization that markedly enhances scene understanding and anomaly detection reliability.
Generalizable Hand‑Object Modeling from Monocular RGB Images via 3D Gaussians – To tackle the high cost of dense 3D annotations or pre‑scanned object models, the paper presents HOGS, a hand‑object interaction reconstruction framework based on 3D Gaussian Splatting (3DGS). HOGS uses adaptively sensed 3D Gaussian primitives to achieve high‑quality hand‑object modeling from single‑view RGB images, maintaining strong generalization to unseen environments and complex motions.
Poster Discussions
In the poster area, a large‑scale pre‑training method for 3D scene reconstruction based on feed‑forward generation sparked discussion about the trade‑off between inference efficiency and structural fidelity, the generalization of domain‑out scenes, and the feasibility of unified 2D‑3D representation learning.
Industry Exhibition Highlights
Several leading AI companies showcased their latest visual technologies. Tesla demonstrated an end‑to‑end visual perception solution for autonomous driving and embodied intelligence, emphasizing large‑scale data‑driven industrial vision. Meta highlighted advances in general‑purpose visual perception, including SAM3 and its 3D extension SAM3D, illustrating the accelerating convergence of academic research and industry practice in foundational vision models and 3D perception.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Network Intelligence Research Center (NIRC)
NIRC is based on the National Key Laboratory of Network and Switching Technology at Beijing University of Posts and Telecommunications. It has built a technology matrix across four AI domains—intelligent cloud networking, natural language processing, computer vision, and machine learning systems—dedicated to solving real‑world problems, creating top‑tier systems, publishing high‑impact papers, and contributing significantly to the rapid advancement of China's network technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
