AntTech
AntTech
Mar 4, 2026 · Artificial Intelligence

Zooming Without Zooming: One‑Pass Fine‑Grained Vision for Multimodal LLMs

A new Region‑to‑Image Distillation (R2I) approach lets multimodal large language models perceive tiny visual details in a single forward pass, eliminating costly tool calls while achieving state‑of‑the‑art accuracy on the ZoomBench fine‑grained benchmark.

Model EfficiencyZoomBenchfine-grained perception
0 likes · 11 min read
Zooming Without Zooming: One‑Pass Fine‑Grained Vision for Multimodal LLMs