AntTech
Mar 4, 2026 · Artificial Intelligence
Zooming Without Zooming: One‑Pass Fine‑Grained Vision for Multimodal LLMs
A new Region‑to‑Image Distillation (R2I) approach lets multimodal large language models perceive tiny visual details in a single forward pass, eliminating costly tool calls while achieving state‑of‑the‑art accuracy on the ZoomBench fine‑grained benchmark.
Model EfficiencyZoomBenchfine-grained perception
0 likes · 11 min read
