Artificial Intelligence 8 min read

Reference‑Guided Image Synthesis Assessment (RISA): Unsupervised Training for Single‑Image Quality Evaluation

The paper presents RISA, a reference‑guided image synthesis assessment model that learns to score the quality of a single generated image without human‑labeled data by leveraging GAN intermediate outputs, pixel‑wise interpolation, multiple binary classifiers, and contrastive learning, achieving results comparable to human perception and earning an AAAI 2022 oral presentation.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Reference‑Guided Image Synthesis Assessment (RISA): Unsupervised Training for Single‑Image Quality Evaluation

The authors address the limitation of existing image generation evaluation methods that assess overall model distribution rather than the quality of individual generated images in reference‑guided tasks, proposing Reference‑guided Image Synthesis Assessment (RISA) which requires no manually annotated training data and aligns closely with human subjective judgments.

RISA’s contributions include: (1) using images generated by intermediate GAN models as training data, labeling them by the number of training iterations; (2) enhancing label granularity with pixel‑wise interpolation and multiple binary classifiers; and (3) introducing an unsupervised contrastive loss to capture style similarity between reference and generated images.

The overall framework is simple: a shared style extractor processes both the reference and generated images, their feature vectors are compared via L1 distance, and the result is fed to several binary classifiers whose averaged output yields a quality score. Training data are sourced from GAN intermediate checkpoints; early‑stage images receive iteration‑based labels directly, while later‑stage images are enriched through linear interpolation between models selected around the FID elbow point. This creates a continuous spectrum of quality labels even when later training stages show little visual change.

The loss function combines three components: a weak‑supervision loss fitting the reference‑generated pair with its quality label, an unsupervised contrastive loss encouraging style similarity, and an upper‑bound loss forcing perfectly matched style pairs to receive the maximum score of 1. Converting the quality prediction to a set of binary classification tasks, rather than direct regression, markedly improves performance.

Experiments on four generative models across five datasets show that RISA can rank images from low to high quality visually, and extensive human preference tests demonstrate a high correlation between RISA scores and subjective judgments, surpassing existing reference and no‑reference IQA methods. RISA also exhibits strong cross‑architecture transferability.

Ablation studies confirm the importance of multiple binary classifiers, pixel‑wise interpolation, and each loss term, highlighting their contributions to the overall performance of RISA.

AIGANimage quality assessmentunsupervised learningreference-guided synthesisRISA
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.