Artificial Intelligence 9 min read

Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset

The CVPR 2025 NTIRE workshop launches an AI-generated image quality assessment competition featuring two tracks—fine‑grained text‑image matching and structural issue detection—supported by the large‑scale EvalMuse dataset, detailed evaluation metrics, baseline code, and a prize pool of up to $10,000.

AIWalker

Feb 8, 2025

EvalMuse Dataset

EvalMuse‑40K contains 40 000 image‑text pairs built from 2 000 real user prompts from DiffusionDB and 2 000 synthetic prompts covering variations in object count, color, material, environment, and action. Twenty diffusion models generated the images. Annotation proceeded in three stages (pre‑annotation, formal annotation, re‑annotation) to produce reliable fine‑grained scores.

Training set: 30 000 pairs with prompt‑level and element‑level alignment scores.

Validation set: ~10 000 pairs.

Test set: ~5 000 pairs.

Each entry provides the prompt, an element list, a fine‑grained matching score (1‑5), and for the structural track a MOS score (1‑5) plus optional bounding‑box annotations (rectangle or polygon) for identified structural issues.

Track 1 – Fine‑Grained Text‑Image Matching

Output a normalized matching score (1‑5) for the whole prompt and a binary hit/miss label for each element (1 = present, 0 = absent). Example: prompt “musician plays guitar aerial shot” with elements {musician, plays, guitar, aerial shot} → score 4.2, element labels [1,1,1,0].

Evaluation uses Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Correlation Coefficient (SRCC) on prompt‑level scores and accuracy on element‑level hits. Final score is:

Final_Score = PLCC / 4 + SRCC / 4 + acc / 2

Baseline implementation is in the Fga‑blip2 repository:

https://github.com/DYEvaLab/EvalMuse

Track 2 – Structural Issue Detection

For each image predict a MOS (1‑5). If structural problems exist, provide up to three bounding‑box annotations. Each annotation includes bbox_type (1 = rectangle, 2 = polygon) and a list of vertex coordinates. Results are saved as a Python dict serialized to .pkl, with keys matching image filenames.

Annotation schema (JSON example):

{
  "img_001.png": {
    "prompt_en": "...",
    "mos": 3,
    "bbox_info": [
      {"bbox_type": 1, "bbox": [x1, y1, x2, y2]},
      {"bbox_type": 2, "bbox": [x1, y1, x2, y2, x3, y3, ...]}
    ]
  },
  ...
}

Baseline code is available at:

https://github.com/DYEvaLab/EvalMuse-Structure

Submission

Submissions are made through the CodaLab competition pages:

Matching track: https://codalab.lisn.upsaclay.fr/competitions/21220

Structure track: https://codalab.lisn.upsaclay.fr/competitions/21269

Methods must be locally reproducible; additional data may be used only if disclosed.

Resources

Paper: https://arxiv.org/abs/2412.18150

Project page: https://shh-han.github.io/EvalMuse-project/

GitHub repository: https://github.com/DYEvaLab/EvalMuse

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

text-to-image benchmark image quality assessment CVPR AI competition NTIRE EvalMuse

Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.