ICCV 2025 MIPI Workshop Launches ViDA-UGC: A New UGC Image Quality Assessment Challenge

The ICCV MIPI workshop introduces the ViDA-UGC competition, presenting a richly annotated UGC image quality dataset, a benchmark suite covering degradation detection, region perception, and quality description, detailed evaluation metrics, submission formats, prize information, and open participation for researchers worldwide.

AIWalker
AIWalker
AIWalker
ICCV 2025 MIPI Workshop Launches ViDA-UGC: A New UGC Image Quality Assessment Challenge
Submission deadline: 2025‑08‑05

Competition Overview

The ViDA‑UGC competition at ICCV MIPI introduces the ViDA‑UGC dataset and the ViDA‑UGC‑Bench benchmark to enable fine‑grained image quality analysis for user‑generated content (UGC). Participants submit a single multimodal large‑language model (MLLM) method that performs three coordinated tasks: degradation localization, fine‑grained perception, and reasoning‑based quality description.

Dataset Construction

ViDA‑UGC consists of two JSON‑based components:

Metadata (train_metadata) : 11,058 UGC images, each annotated with bounding boxes for degradation regions, degradation type, and a MOS score. Five annotators provided each MOS; after cleaning, the dataset contains 11,534 images with an average of 3.6 degradation boxes per image.

Instruction‑tuned data (combined.json) : ~534 K entries split into three parts—ViDA‑Description, ViDA‑Perception, and ViDA‑Grounding.

Metadata example:

{
    "image": "KONIQ10k_8930347179.jpg",
    "mos": 2.8,
    "level": "fair",
    "width": 512,
    "height": 384,
    "distortions": [
        {
            "id": "1",
            "distortion": "Low clarity",
            "coordinates": [10,245,988,930],
            "region_quality_scores": "The quality is fair.",
            "region_importance_score": "This region is essential to the overall quality.",
            "position": "the bird perched on the wire",
            "severity": "Moderate",
            "perception impact": "The low clarity distortion reduces the sharpness and detail of the bird's features...",
            "visual manifestation": "The bird appears less sharp, with slightly blurred edges..."
        },
        {
            "id": "2",
            "distortion": "Edge aliasing effect",
            "coordinates": [484,445,580,581],
            "region_quality_scores": "The quality is fair.",
            "region_importance_score": "This region is important to the overall quality.",
            "position": "The body of the bird and the wire it is perched on",
            "severity": "Moderate",
            "perception impact": "The edge aliasing effect reduces the clarity and smoothness of the bird's body...",
            "visual manifestation": "The edge aliasing effect causes jagged and stair‑step patterns..."
        }
    ]
}

Instruction‑Tuned Data Details

ViDA‑Description : For each degradation box, GPT receives the four low‑level attributes (position, severity, impact, significance) and generates a detailed textual description.

ViDA‑Grounding : The same attributes and bounding‑box coordinates are transformed into a localization dataset.

ViDA‑Perception : Attributes are used to create multiple‑choice and visual‑question‑answer items that probe perception accuracy.

Benchmark Composition

From the full dataset, 476 images are selected for ViDA‑UGC‑Bench, covering all ten UGC distortion types. The benchmark provides:

476 quality‑analysis samples (ViDA‑Description).

2,567 multiple‑choice questions (ViDA‑Perception).

3,106 grounding entries (ViDA‑Grounding).

Evaluation Metrics

The final score is the arithmetic mean of six metrics:

Region mAP (localization of degradation regions).

Distortion mAP (detection of degradation types).

Perception Accuracy (multiple‑choice QA).

Description mAP (description‑level detection).

Key Distortion Accuracy (ACC₀.₅ for critical degradations).

Image Quality Accuracy (overall quality label).

Final Score = (Region mAP + Distortion mAP + Perception Accuracy + Description mAP + Key Distortion Accuracy + Image Quality Accuracy) / 6

Submission Formats

Distortion Detection (multi‑box)

{
    "image": "LIVEfb_AVA__776348.jpg",
    "pred_distortions": [
        {"distortion": "Edge aliasing effect", "pred_coords": [189,174,456,450]},
        {"distortion": "Overexposure", "pred_coords": [269,273,368,350]}
    ]
}

Region Perception uses the identical JSON structure. Perception QA (multiple‑choice):

{
    "id": 0,
    "concern": "position",
    "question": "Which part of the image is without any distortion?",
    "image": "Portrait_v0300fg10000c4f4ck3c77u12leur4lg.png",
    "candidates": ["The wooden cutting board","The gloved hand","The text overlay in the upper‑left area of the image","The background region in the upper‑left corner"],
    "pred_ans": "A"
}

ViDA‑Description Output must contain three fields:

{
    "id": 0,
    "image": "Portrait_v0300fg10000c4f4ck3c77u12leur4lg.png",
    "width": 720,
    "height": 1280,
    "pred_ans": "The image shows a block of cheese... The quality of the image is fair.",
    "all_distortions": [{"distortion": "low clarity", "severity": "moderate", "coordinates": [199,10,922,320]}],
    "key_distortions": [{"distortion": "low clarity", "severity": "moderate", "coordinates": [199,10,922,320]}],
    "image_quality": "fair"
}

Data Generation Pipeline

1. Sampling : Images are drawn from multiple UGC sources. Low‑level feature vectors are computed for each image, and an improved MILP (Mixed‑Integer Linear Programming) sampler selects images to achieve a uniform feature distribution, balancing high‑ and low‑quality samples. 2. Human Annotation : Five annotators label degradation bounding boxes and assign MOS scores. 3. GPT Attribute Generation : For each box, GPT receives the four attributes (position, severity, impact, significance) and produces a natural‑language description (ViDA‑Description). 4. Grounding Construction : The same attribute set is converted into a structured localization file (ViDA‑Grounding). 5. Perception Question Design : Using the attribute information, a template generates multiple‑choice and visual‑question‑answer items (ViDA‑Perception). 6. Quality Control : A professional team reviews all generated content to mitigate GPT‑4o bias and ensure consistency. Participation Rules (Technical) Each team submits one MLLM‑based method; no API‑only solutions. All code must be locally reproducible; external data usage must be disclosed. Only the three output formats described above are accepted. Key Statistics Total cleaned images: 11,534. Average degradations per image: 3.6. Benchmark images: 476 (covers all 10 UGC distortion types). Instruction‑tuned entries: ~534 K (ViDA‑Description, ViDA‑Perception, ViDA‑Grounding). Resources Competition website: https://www.codabench.org/competitions/8156/ Contact: [email protected], [email protected]

benchmarkimage quality assessmentUGCDatasetmultimodal LLMMIPIICCV
AIWalker
Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.