vivo Internet Technology
vivo Internet Technology
Aug 25, 2025 · Artificial Intelligence

How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training

DiMo-GUI is a plug‑and‑play framework that dramatically improves multimodal large language models' ability to locate GUI elements by using a hierarchical dynamic visual reasoning loop and modality‑aware optimization, achieving up to double the performance on high‑resolution GUI benchmarks without any additional training data.

GUI groundingdynamic visual reasoningmodality-aware optimization
0 likes · 7 min read
How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training