How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

This article breaks down the ICCV 2025 paper "Zero‑Shot Image Anomaly Detection Using Generative Foundation Models," explaining how DiffPathV2 leverages diffusion model denoising trajectories, six‑dimensional score errors, and SSIM weighting to detect out‑of‑distribution images without any task‑specific training, achieving state‑of‑the‑art AUROC scores across multiple benchmarks.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

Paper Information

Title: Zero‑Shot Image Anomaly Detection Using Generative Foundation Models

Authors: Lemar Abdi, Amaan Valiuddin, Francisco Caetano, Christiaan Viviers, Fons van der Sommen

Motivation for Zero‑Shot Anomaly Detection

Traditional anomaly detectors require retraining for each new dataset and fail on out‑of‑distribution (OOD) samples. The paper proposes using a pretrained generative foundation model, specifically a denoising diffusion model (DDM), whose denoising trajectories provide a universal cue for distinguishing normal from abnormal images.

DiffPathV2 Overview

DiffPathV2 extends the original DiffPath method. The pipeline consists of three steps:

Use a pretrained diffusion model to predict the noise (score) for an input image at every diffusion timestep.

Compute the mean‑squared error (MSE) between the predicted noise and the true noise, then analyze the temporal evolution of this error.

Weight the error map with a structural similarity (SSIM) map so that regions with larger structural differences receive higher emphasis.

DiffPathV2 framework diagram
DiffPathV2 framework diagram

Key Innovation 1 – Score Error

Instead of using the raw diffusion score, DiffPathV2 measures the MSE between the predicted and true noise. This error signal captures richer information; anomalous images produce larger errors, especially in complex semantic contexts.

Key Innovation 2 – Six‑Dimensional Score

The error statistics are aggregated into a six‑dimensional vector:

First three dimensions: sums of the 1st, 2nd, and 3rd order norms of the error across timesteps (overall magnitude).

Last three dimensions: sums of the 1st, 2nd, and 3rd order norms of the temporal derivative of the error (trend).

Key Innovation 3 – SSIM‑Based Regional Weighting

SSIM is computed between the original image and the image reconstructed from the predicted noise. The complement (1 – SSIM) serves as a weight, amplifying regions with structural discrepancies before applying the six‑dimensional score.

Experimental Results

Evaluation on five benchmarks (CIFAR‑10, CIFAR‑100, SVHN, CelebA, Textures) using AUROC shows an average AUROC of 94.9, surpassing prior methods.

Main experimental results table
Main experimental results table

Ablation studies demonstrate that removing any of the three innovations (score error, six‑dimensional aggregation, SSIM weighting) degrades performance, confirming their individual contributions.

Ablation study results table
Ablation study results table

Anomaly‑score histograms show a clear separation between normal (blue) and abnormal (orange) samples after applying DiffPathV2.

Anomaly score histogram
Anomaly score histogram

Pre‑Training Data Insight

Contrary to the assumption that larger, more diverse datasets always yield better representations, a model pretrained on CelebA (20 k faces) outperforms one pretrained on ImageNet (14 M images) for anomaly detection. The higher structural consistency of faces makes the diffusion trajectory more sensitive to subtle perturbations.

Pre‑training dataset comparison table
Pre‑training dataset comparison table

Why the Paper Matters

True zero‑shot capability: A single training dataset suffices to detect anomalies in many unseen domains.

Theory‑driven design: The error‑based score and SSIM weighting are grounded in the properties of diffusion denoising trajectories.

State‑of‑the‑art performance: Consistently superior AUROC on both near‑OOD (CIFAR‑10 vs. CIFAR‑100) and far‑OOD scenarios.

Practical deployment: No additional fine‑tuning or retraining is required; any pretrained diffusion model can be plugged in.

Future Directions

Potential extensions include scaling to larger generative models or exploring alternative regional weighting schemes to further close gaps on challenging datasets such as Textures.

Code example

收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!
Diffusion ModelsSSIMAUROCDiffPathV2generative foundation modelsimage anomaly detection
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.