Join the CVPR 2025 NTIRE AI-Generated Image Quality Challenge: Dual Tracks, Big Prizes, and the EvalMuse Dataset
The CVPR 2025 NTIRE workshop launches an AI-generated image quality assessment competition featuring two tracks—fine‑grained text‑image matching and structural issue detection—supported by the large‑scale EvalMuse dataset, detailed evaluation metrics, baseline code, and a prize pool of up to $10,000.
EvalMuse Dataset
EvalMuse‑40K contains 40 000 image‑text pairs built from 2 000 real user prompts from DiffusionDB and 2 000 synthetic prompts covering variations in object count, color, material, environment, and action. Twenty diffusion models generated the images. Annotation proceeded in three stages (pre‑annotation, formal annotation, re‑annotation) to produce reliable fine‑grained scores.
Training set: 30 000 pairs with prompt‑level and element‑level alignment scores.
Validation set: ~10 000 pairs.
Test set: ~5 000 pairs.
Each entry provides the prompt, an element list, a fine‑grained matching score (1‑5), and for the structural track a MOS score (1‑5) plus optional bounding‑box annotations (rectangle or polygon) for identified structural issues.
Track 1 – Fine‑Grained Text‑Image Matching
Output a normalized matching score (1‑5) for the whole prompt and a binary hit/miss label for each element (1 = present, 0 = absent). Example: prompt “musician plays guitar aerial shot” with elements {musician, plays, guitar, aerial shot} → score 4.2, element labels [1,1,1,0].
Evaluation uses Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Correlation Coefficient (SRCC) on prompt‑level scores and accuracy on element‑level hits. Final score is:
Final_Score = PLCC / 4 + SRCC / 4 + acc / 2
Baseline implementation is in the Fga‑blip2 repository:
https://github.com/DYEvaLab/EvalMuse
Track 2 – Structural Issue Detection
For each image predict a MOS (1‑5). If structural problems exist, provide up to three bounding‑box annotations. Each annotation includes bbox_type (1 = rectangle, 2 = polygon) and a list of vertex coordinates. Results are saved as a Python dict serialized to .pkl, with keys matching image filenames.
Annotation schema (JSON example):
{
"img_001.png": {
"prompt_en": "...",
"mos": 3,
"bbox_info": [
{"bbox_type": 1, "bbox": [x1, y1, x2, y2]},
{"bbox_type": 2, "bbox": [x1, y1, x2, y2, x3, y3, ...]}
]
},
...
}Baseline code is available at:
https://github.com/DYEvaLab/EvalMuse-Structure
Submission
Submissions are made through the CodaLab competition pages:
Matching track: https://codalab.lisn.upsaclay.fr/competitions/21220
Structure track: https://codalab.lisn.upsaclay.fr/competitions/21269
Methods must be locally reproducible; additional data may be used only if disclosed.
Resources
Paper: https://arxiv.org/abs/2412.18150
Project page: https://shh-han.github.io/EvalMuse-project/
GitHub repository: https://github.com/DYEvaLab/EvalMuse
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
