Mar 30, 2026 · Artificial Intelligence

Proactive Interaction for Video Multimodal Models: MMDuet2 & ProactiveVideoQA

This article surveys the ICLR 2026 papers ProactiveVideoQA and MMDuet2, detailing how video multimodal large models can decide when to reply autonomously, the PAUC benchmark for evaluating timeliness and accuracy, a reinforcement‑learning training pipeline that requires no precise timestamps, and experimental findings on data construction, frame‑sampling density, and SOTA performance.

MMDuet2PAUCbenchmark

0 likes · 17 min read

Proactive Interaction for Video Multimodal Models: MMDuet2 & ProactiveVideoQA