Artificial Intelligence 10 min read

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

This article breaks down the ICCV 2025 paper "Zero‑Shot Image Anomaly Detection Using Generative Foundation Models," explaining how DiffPathV2 leverages diffusion model denoising trajectories, six‑dimensional score errors, and SSIM weighting to detect out‑of‑distribution images without any task‑specific training, achieving state‑of‑the‑art AUROC scores across multiple benchmarks.

AI Frontier Lectures

Nov 4, 2025

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

Paper Information

Title: Zero‑Shot Image Anomaly Detection Using Generative Foundation Models

Authors: Lemar Abdi, Amaan Valiuddin, Francisco Caetano, Christiaan Viviers, Fons van der Sommen

Motivation for Zero‑Shot Anomaly Detection

Traditional anomaly detectors require retraining for each new dataset and fail on out‑of‑distribution (OOD) samples. The paper proposes using a pretrained generative foundation model, specifically a denoising diffusion model (DDM), whose denoising trajectories provide a universal cue for distinguishing normal from abnormal images.

DiffPathV2 Overview

DiffPathV2 extends the original DiffPath method. The pipeline consists of three steps:

Use a pretrained diffusion model to predict the noise (score) for an input image at every diffusion timestep.

Compute the mean‑squared error (MSE) between the predicted noise and the true noise, then analyze the temporal evolution of this error.

Weight the error map with a structural similarity (SSIM) map so that regions with larger structural differences receive higher emphasis.

Key Innovation 1 – Score Error

Instead of using the raw diffusion score, DiffPathV2 measures the MSE between the predicted and true noise. This error signal captures richer information; anomalous images produce larger errors, especially in complex semantic contexts.

Key Innovation 2 – Six‑Dimensional Score

The error statistics are aggregated into a six‑dimensional vector:

First three dimensions: sums of the 1st, 2nd, and 3rd order norms of the error across timesteps (overall magnitude).

Last three dimensions: sums of the 1st, 2nd, and 3rd order norms of the temporal derivative of the error (trend).

Key Innovation 3 – SSIM‑Based Regional Weighting

SSIM is computed between the original image and the image reconstructed from the predicted noise. The complement (1 – SSIM) serves as a weight, amplifying regions with structural discrepancies before applying the six‑dimensional score.

Experimental Results

Evaluation on five benchmarks (CIFAR‑10, CIFAR‑100, SVHN, CelebA, Textures) using AUROC shows an average AUROC of 94.9, surpassing prior methods.

Ablation studies demonstrate that removing any of the three innovations (score error, six‑dimensional aggregation, SSIM weighting) degrades performance, confirming their individual contributions.

Anomaly‑score histograms show a clear separation between normal (blue) and abnormal (orange) samples after applying DiffPathV2.

Pre‑Training Data Insight

Contrary to the assumption that larger, more diverse datasets always yield better representations, a model pretrained on CelebA (20 k faces) outperforms one pretrained on ImageNet (14 M images) for anomaly detection. The higher structural consistency of faces makes the diffusion trajectory more sensitive to subtle perturbations.

Why the Paper Matters

True zero‑shot capability: A single training dataset suffices to detect anomalies in many unseen domains.

Theory‑driven design: The error‑based score and SSIM weighting are grounded in the properties of diffusion denoising trajectories.

State‑of‑the‑art performance: Consistently superior AUROC on both near‑OOD (CIFAR‑10 vs. CIFAR‑100) and far‑OOD scenarios.

Practical deployment: No additional fine‑tuning or retraining is required; any pretrained diffusion model can be plugged in.

Future Directions

Potential extensions include scaling to larger generative models or exploring alternative regional weighting schemes to further close gaps on challenging datasets such as Textures.

Code example

收
藏
，
分
享
、
在
看
，
给
个
三
连
击呗！

Diffusion Models SSIM AUROC DiffPathV2 generative foundation models image anomaly detection

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.