How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

The paper introduces TIR‑Agent, an image‑restoration agent that learns a tool‑calling policy via supervised fine‑tuning and reinforcement learning, addressing exploration stagnation and multi‑objective reward imbalance, and demonstrates over 2.5× faster inference and superior multi‑metric performance on synthetic and real degradation datasets.

AIWalker
AIWalker
AIWalker
How TIR‑Agent Turns Image‑Restoration Tools into a Learnable Decision‑Making Agent

Introduction

Image restoration in the wild often involves several degradations (noise, blur, compression artifacts, exposure errors) that cannot be captured by a single All‑in‑One network. Agent‑based pipelines split the problem into a sequence of tool calls (denoise, deblur, upscale, etc.) but existing agents follow hand‑crafted schedules or exhaustive tool traversal, which yields low efficiency and no learned decision‑making. The paper "TIR‑Agent" (arXiv:2603.27742) models the choice of what tool to call and in which order as a learnable policy trained in two stages: Supervised Fine‑Tuning (SFT) followed by Reinforcement Learning (RL). This transforms the agent from a brute‑force searcher into a restoration master that can dynamically decide when to denoise, deblur, or upscale, achieving >2.5× inference speed‑up.

Challenges

Two fundamental problems hinder existing agents:

Fixed heuristic schedules prevent the discovery of optimal repair paths.

Exhaustive tool traversal incurs high computational overhead.

Empirical observations after SFT reveal exploration stagnation (the policy repeatedly follows the same tool sequence) and metric bias (static weighted sums of PSNR, SSIM, CLIP‑IQA, MUSIQ cause reward hacking and unstable training).

Method

Exploration‑Driven Data Perturbation (EDP)

To break the single‑trajectory bias, the SFT dataset is augmented in two ways:

Randomly reorder the restoration steps (e.g., upscale → denoise → deblur) to expose the model to diverse scheduling patterns.

With a preset probability replace the output of a chosen tool by the output of an alternative tool for the same sub‑task, forcing the model to learn selection among multiple candidates.

This yields a distribution of trajectories rather than a deterministic path, encouraging the policy to explore many possible schedules during RL.

Multi‑Dimensional Adaptive Reward (MAR)

Instead of a static linear combination of metrics, MAR adjusts each metric’s weight w_i = f(current_i, avg_i) at every RL step: if the current value of metric i falls below its running average, w_i is increased; otherwise it is decreased. The dynamic weighting automatically balances reference‑based metrics (PSNR, SSIM) against perception‑based metrics (CLIP‑IQA, MUSIQ), preventing reward hacking and stabilising multi‑objective optimization.

Global Tool‑Calling Pool

Training requires millions of tool invocations. A shared asynchronous service pools all tool calls, dispatches requests from different trajectories in parallel, and centrally schedules GPU resources. This high‑throughput, service‑oriented interface eliminates long queues and reduces execution errors, making large‑scale RL feasible.

Experiments

Evaluation is performed on two benchmark suites:

MiO‑100 : synthetic dataset with combinatorial degradations.

FoundIR : real‑world dataset containing blur, noise, and JPEG artifacts.

Baselines include state‑of‑the‑art All‑in‑One models, a training‑free agent, and a leading closed‑source multimodal model.

On MiO‑100, TIR‑Agent attains the best score on all 15 metric‑degradation combinations and yields nearly 3× inference acceleration compared with the training‑free baseline (Group A).

Out‑of‑distribution tests with unseen degradation combinations (D=2 and D≥3) show TIR‑Agent winning 11 out of 12 metric comparisons, demonstrating strong generalisation.

On the real‑world Blur+Noise and Blur+Noise+JPEG subsets, TIR‑Agent surpasses the closed‑source model on most perceptual metrics (CLIP‑IQA, MUSIQ) while remaining competitive on PSNR/SSIM.

Policy analysis (Figures 4‑8 in the paper) reveals that TIR‑Agent selects tools adaptively, avoiding the coarse‑grained bias of other agents that over‑use generic denoise or deblur modules.

Conclusion

The work answers the core question “how should an image‑restoration agent learn to make decisions?” by:

Introducing EDP to endow the policy with exploration capability.

Applying MAR to dynamically balance multiple quality metrics.

Providing a high‑throughput global tool‑calling infrastructure.

These three components jointly improve restoration quality, inference speed, and generalisation to unseen degradation scenarios, and constitute a transferable paradigm for any multi‑step decision‑plus‑tool problem (e.g., video restoration, medical‑image enhancement, generative content pipelines).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

computer visionreinforcement learningmulti-objective optimizationImage Restorationagent-based AItool scheduling
AIWalker
Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.