How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC
This article breaks down the ICCV 2025 paper "Zero‑Shot Image Anomaly Detection Using Generative Foundation Models," explaining how DiffPathV2 leverages diffusion model denoising trajectories, six‑dimensional score errors, and SSIM weighting to detect out‑of‑distribution images without any task‑specific training, achieving state‑of‑the‑art AUROC scores across multiple benchmarks.
Paper Information
Title: Zero‑Shot Image Anomaly Detection Using Generative Foundation Models
Authors: Lemar Abdi, Amaan Valiuddin, Francisco Caetano, Christiaan Viviers, Fons van der Sommen
Motivation for Zero‑Shot Anomaly Detection
Traditional anomaly detectors require retraining for each new dataset and fail on out‑of‑distribution (OOD) samples. The paper proposes using a pretrained generative foundation model, specifically a denoising diffusion model (DDM), whose denoising trajectories provide a universal cue for distinguishing normal from abnormal images.
DiffPathV2 Overview
DiffPathV2 extends the original DiffPath method. The pipeline consists of three steps:
Use a pretrained diffusion model to predict the noise (score) for an input image at every diffusion timestep.
Compute the mean‑squared error (MSE) between the predicted noise and the true noise, then analyze the temporal evolution of this error.
Weight the error map with a structural similarity (SSIM) map so that regions with larger structural differences receive higher emphasis.
Key Innovation 1 – Score Error
Instead of using the raw diffusion score, DiffPathV2 measures the MSE between the predicted and true noise. This error signal captures richer information; anomalous images produce larger errors, especially in complex semantic contexts.
Key Innovation 2 – Six‑Dimensional Score
The error statistics are aggregated into a six‑dimensional vector:
First three dimensions: sums of the 1st, 2nd, and 3rd order norms of the error across timesteps (overall magnitude).
Last three dimensions: sums of the 1st, 2nd, and 3rd order norms of the temporal derivative of the error (trend).
Key Innovation 3 – SSIM‑Based Regional Weighting
SSIM is computed between the original image and the image reconstructed from the predicted noise. The complement (1 – SSIM) serves as a weight, amplifying regions with structural discrepancies before applying the six‑dimensional score.
Experimental Results
Evaluation on five benchmarks (CIFAR‑10, CIFAR‑100, SVHN, CelebA, Textures) using AUROC shows an average AUROC of 94.9, surpassing prior methods.
Ablation studies demonstrate that removing any of the three innovations (score error, six‑dimensional aggregation, SSIM weighting) degrades performance, confirming their individual contributions.
Anomaly‑score histograms show a clear separation between normal (blue) and abnormal (orange) samples after applying DiffPathV2.
Pre‑Training Data Insight
Contrary to the assumption that larger, more diverse datasets always yield better representations, a model pretrained on CelebA (20 k faces) outperforms one pretrained on ImageNet (14 M images) for anomaly detection. The higher structural consistency of faces makes the diffusion trajectory more sensitive to subtle perturbations.
Why the Paper Matters
True zero‑shot capability: A single training dataset suffices to detect anomalies in many unseen domains.
Theory‑driven design: The error‑based score and SSIM weighting are grounded in the properties of diffusion denoising trajectories.
State‑of‑the‑art performance: Consistently superior AUROC on both near‑OOD (CIFAR‑10 vs. CIFAR‑100) and far‑OOD scenarios.
Practical deployment: No additional fine‑tuning or retraining is required; any pretrained diffusion model can be plugged in.
Future Directions
Potential extensions include scaling to larger generative models or exploring alternative regional weighting schemes to further close gaps on challenging datasets such as Textures.
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
