iDetex: The Winning AI Model Transforming Image Quality Assessment

iDetex, the champion solution of the ICCV 2025 MIPI Detailed Image Quality Assessment Challenge, introduces a novel multimodal LLM-driven framework that precisely locates, describes, and grades image distortions, outperforming traditional IQA models and enabling practical deployments across video, live streaming, e‑commerce, and image‑processing pipelines.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
iDetex: The Winning AI Model Transforming Image Quality Assessment

Introduction

In the ICCV 2025 MIPI Detailed Image Quality Assessment Challenge, the IH‑VQA team from WeChat Test Center won the championship with their novel iDetex solution, setting a new industry standard for fine‑grained image quality evaluation and driving practical deployments in video, short‑video, live streaming, and e‑commerce services.

Task Background

Image Quality Assessment (IQA) seeks to build models that reflect human visual system perception. Traditional IQA models provide only a single overall score, lacking interpretability and fine‑grained analysis of distortion types, locations, and their impact on visual perception.

To advance IQA toward explainable intelligence, the Douyin Multimedia Quality Lab and the Basic Experience Algorithm team co‑organized a Detailed Image Quality Assessment track at the fourth ICCV MIPI Workshop, encouraging the use of multimodal large language models (MLLMs) for precise distortion localization, multi‑dimensional perception, and causal reasoning.

Dataset and Competition

The competition used the ViDA‑UGC dataset, which consists of two parts: metadata (11,058 images with overall quality grades, resolution, and detailed distortion annotations) and instruction‑fine‑tuning data (~534 K entries) covering three dimensions: Description, Perception, and Grounding.

Perception : 2,567 multiple‑choice questions evaluated by Perception Accuracy.

Grounding : Two sub‑tasks (distortion bounding‑box detection and region‑wise distortion identification) evaluated by mAP.

Description : Four‑step answer format—brief description, distortion localization and impact analysis, overall quality analysis, and final quality grade.

iDetex Architecture

The iDetex pipeline first extracts visual tokens with a visual encoder, then feeds these tokens together with a system prompt into a large language model. Guided by the prompt, the LLM performs a chain‑of‑thought reasoning process: (1) brief image description, (2) distortion localization and detailed analysis, (3) identification of key distortions affecting overall perception, and (4) generation of an overall quality rating. The detected distortions are visualized on the original image for user inspection.

Grounding Enhancement – Spatial Perturbation

To improve robustness in distortion localization, random cropping and horizontal flipping are applied. Bounding‑box coordinates undergo corresponding affine transformations to keep annotations valid, enriching spatial diversity and encouraging the model to focus on intrinsic distortion patterns rather than absolute positions.

Perception Enhancement – Query Style Alignment

The perception task is multiple‑choice. By analyzing the style of test‑set questions and generating training questions with matching style using metadata, the model’s query distribution aligns with the test distribution, reducing confusion caused by wording differences and improving accuracy.

Description Enhancement – Fine‑Grained Scoring

The original description task combined distortion localization, key‑distortion identification, and overall quality assessment, causing task interference. We decoupled the overall quality evaluation by reusing the Perception prompt, while keeping the original prompts for distortion questions. Additionally, we refined the quality label granularity from a 5‑level scale (bad, poor, fair, good, excellent) to a 10‑level scale (a‑j) using a linear mapping, then mapped predictions back to the original 5‑level scale for compatibility.

Data Mixing & Global Augmentation

Rather than training separate models for each sub‑task, we performed joint multi‑task fine‑tuning. Spatial‑perturbation data replaced 15‑45% of original grounding data, query‑style aligned data fully replaced original perception data, and fine‑grained description data replaced the original description data. This mixed dataset, combined with a strong visual encoder (e.g., InternVL3) and higher‑resolution inputs (up to 2048×2048), yielded superior performance across all metrics.

Business Deployment

Compared with traditional IQA models that output a single score and brief description, iDetex provides diagnostic reports: precise distortion types, localized bounding boxes, impact analysis, and a comprehensive quality rating. This multi‑dimensional insight transforms scoring into actionable guidance.

Applications include:

Content creation (image, short video, live streaming) : Automatic feedback such as “face region blurred” or “dark segment noisy” helps creators improve cover images and video quality.

Quality loss analysis : Pinpoint where quality degradation occurs in the pipeline (capture, transcoding, transmission) and quantify issues like edge sharpness loss.

E‑commerce : Real‑time inspection of product images to detect issues (blurred edges, low brightness) and guide merchants to correct them before upload.

Results and Awards

The IH‑VQA team achieved first place, leading in Perception Accuracy (+4%), Region mAP (+4%), Distortion mAP (+6%), and Image Quality Accuracy (+2%). Their solution has been accepted as a paper at the ICCV 2025 Workshop.

Acknowledgements

Team members: Sun Jianhui, Shao Tao, Yue Xinli, Xie Yuhui, Zhao Zhaoran.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

computer visionAIimage quality assessmentmultimodal LLMICCV 2025iDetex
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.