AIWalker
Mar 19, 2026 · Artificial Intelligence
Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy
Vision‑R1 introduces a 7B multimodal large language model that leverages 200K unsupervised CoT data, Modality Bridging, and Progressive Thinking Suppression Training to overcome data scarcity and over‑thinking, achieving 73.5% accuracy on MathVista—within 0.4% of OpenAI’s O1.
benchmark performancechain of thoughtlarge language models
0 likes · 12 min read
