AgenticIR: An Agentic System for Restoring Images with Complex Degradations
AgenticIR combines visual language models and large language models in a multi‑stage reasoning workflow—perception, planning, execution, reflection, and adjustment—to evaluate, plan, and iteratively apply specialized restoration tools, achieving superior results on complexly degraded images compared to baseline methods.
Research Motivation
Current state‑of‑the‑art image‑processing algorithms are largely end‑to‑end learned and, while successful on specific tasks, suffer from poor generalization and lack of intelligible intelligence. Real‑world image restoration involves virtually unlimited combinations of degradations, requiring a system that can assess image quality, understand degradation types, and make dynamic, human‑like decisions. The authors therefore propose a paradigm that explicitly models cognitive behaviors—evaluation, planning, execution, reflection, and adjustment—to approach general image restoration.
Key Behavioral Patterns
Human image restoration follows five steps: evaluate the image quality and identify degradations (e.g., noise, rain, low resolution); plan a sequence of tool invocations; execute the plan; reflect on the results; and adjust the plan if needed. An illustrative example shows a complex‑degraded image being evaluated, a plan of rain removal → denoising → super‑resolution being executed, and iterative reflection‑adjust cycles leading to a satisfactory result.
Required Capabilities
Ability to evaluate image quality beyond a single score, including identifying degradation sources and severity.
Contextual reasoning to make dynamic decisions in diverse, complex scenarios.
Domain knowledge of image restoration to generate low‑coupling tool sequences.
These capabilities are obtained by leveraging recent visual‑language models (VLMs) and large language models (LLMs). VLMs provide natural‑language descriptions of image quality, while LLMs handle complex reasoning and planning. Fine‑tuning and prompting are used to inject domain knowledge.
Methodology
Research Platform
The authors construct a platform that simulates real‑world conditions. Complex degradations are modeled as combinations of 2–3 single degradations (e.g., rain+fog, low‑light+defocus+JPEG artifacts). For each single degradation, 3–6 recent deep‑learning tools are selected, yielding a diverse toolbox for AgenticIR to orchestrate.
Reasoning Workflow
AgenticIR follows a five‑stage workflow:
Perception : A VLM (DepictQA) analyzes the input image, evaluates quality, and enumerates present degradations.
Planning : An LLM generates a restoration plan, ordering tool calls for each identified degradation.
Execution : The plan is carried out by invoking the selected tools.
Reflection : After each tool execution, the VLM assesses whether the step succeeded.
Adjustment : If a step fails, the LLM revises the plan and the cycle repeats.
Through repeated execute‑reflect‑adjust loops, the system converges to a high‑quality restored image.
Acquiring Domain Knowledge
To endow the LLM with restoration expertise, the authors employ self‑exploration. For complex‑degraded images, they exhaustively enumerate tool sequences, let the VLM evaluate outcomes, and record success rates. The LLM then summarizes these statistics into concise knowledge, which is used during planning and adjustment. This unsupervised knowledge distillation enables the LLM to make informed decisions without explicit supervision.
Experimental Results
Effectiveness of Individual Modules
DepictQA fine‑tuning : On a test set, the fine‑tuned DepictQA successfully classifies the presence of degradations, demonstrating its suitability for the evaluation stage.
Self‑exploration knowledge : The distilled knowledge improves planning quality. A comparison shows that plans guided by this knowledge outperform unguided random plans.
Ablation of Reflection and Adjustment : Removing either mechanism degrades performance across all metrics, confirming their critical role in the workflow.
Comparison with Baselines
AgenticIR is contrasted with a naive baseline that randomly selects tools after DepictQA identifies degradations, and with an all‑in‑one restoration model. Quantitative results (see figures) indicate that AgenticIR achieves higher PSNR/SSIM and better visual fidelity on real‑world complexly degraded images.
Qualitative examples illustrate the system handling a low‑quality screen‑capture image (motion blur → defocus → low‑light) and an underwater image (defocus → haze → motion blur), successfully decomposing the tasks and applying appropriate tools in a sensible order.
Conclusion
The work introduces AgenticIR, an agentic framework that mimics human image‑restoration behavior by integrating VLMs and LLMs to orchestrate existing restoration tools. The paradigm demonstrates that explicit cognitive modeling—evaluation, planning, execution, reflection, and adjustment—can substantially improve restoration quality on complex degradations, moving toward more general and intelligible image‑processing intelligence.
References
[1] Yihao Liu et al., "Discovering distinctive \"semantics\" in super‑resolution networks," arXiv:2108.00406, 2021.
[2] Jinfan Hu et al., "Interpreting Low‑level Vision Models with Causal Effect Maps," IEEE TPAMI, 2025.
[3] Xiangtao Kong et al., "A preliminary exploration towards general image restoration," arXiv:2408.15143, 2024.
[4] Zhiyuan You et al., "Depicting beyond scores: Advancing image quality assessment through multi‑modal language models," ECCV, 2024.
[5] Zhiyuan You et al., "Descriptive image quality assessment in the wild," arXiv:2405.18842, 2024.
[6] OpenAI, "GPT‑4 technical report," arXiv:2303.08774, 2023.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
