Amap Tech
Apr 21, 2025 · Artificial Intelligence
Lenna: Language‑Enhanced Reasoning Detection Assistant and a Chain‑of‑Thought Image Editing Framework Using Multimodal Large Language Models
At ICASSP 2025, Gaode’s two accepted papers present Lenna, a language‑enhanced reasoning detection assistant that adds a DET token to multimodal LLMs and achieves state‑of‑the‑art accuracy on RefCOCO benchmarks, and a chain‑of‑thought image‑editing framework that converts complex prompts into segmented masks and repair prompts for diffusion‑based inpainting, surpassing existing methods.
AIChain-of-ThoughtICASSP
0 likes · 10 min read