Introducing ICONIC-444: A 3.1M Industrial Image Dataset Redefining OOD Detection

The article presents ICONIC-444, a 3.1‑million‑image, 444‑class industrial dataset designed for out‑of‑distribution (OOD) detection, explains its realistic acquisition process, hierarchical OOD categories, benchmark tasks, and evaluates 22 state‑of‑the‑art OOD methods, revealing how dataset characteristics influence algorithm performance.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Introducing ICONIC-444: A 3.1M Industrial Image Dataset Redefining OOD Detection

01 The Old Problem and New Solution for OOD Detection

Out‑of‑distribution (OOD) detection is essential for safety‑critical AI applications such as autonomous driving, medical diagnosis, and industrial quality inspection. Existing benchmarks rely on repurposed natural‑image datasets (CIFAR, ImageNet) that suffer from unrealistic scenarios, vague OOD definitions, data contamination, and limited scale.

02 ICONIC‑444: A Real‑World Benchmark for OOD Research

ICONIC‑444 (Image Classification and OOD Detection with Numerous Intricate Complexities) is a large‑scale industrial image dataset containing over 3.1 million RGB images across 444 categories. Images were captured with a prototype food‑sorting machine under controlled lighting and a uniform blue background. Rigorous cleaning removes blurry, duplicate, or mis‑labeled samples.

Industrial sorting machine prototype
Industrial sorting machine prototype

2.1 Industrial Origin Guarantees Realism

All images originate from a dedicated food‑sorting line, so OOD samples correspond to real contaminants on a production line. The uniform backdrop and fixed camera angles eliminate background noise, forcing models to focus on object features.

2.2 Hierarchical OOD Categories

Near‑OOD : Semantically close objects (e.g., other nuts when the ID task is almond classification).

Far‑OOD : Moderately distant objects (e.g., non‑food items such as glass shards).

Extreme‑OOD : Completely unrelated images sourced from external datasets like ImageNet or iNaturalist.

Synthetic‑OOD : Artificial patterns, solid colors, noise, or geometric shapes.

2.3 Four Benchmark Tasks

Almond : Fine‑grained classification of 7 almond variants.

Wheat : Highly fine‑grained classification of 12 wheat varieties.

Kernels : Medium‑granularity classification of 29 seed and grain types.

Food‑grade : Large‑scale, coarse‑grained classification of 324 food categories.

Benchmark tasks overview
Benchmark tasks overview

03 Benchmark Experiments: How Do SOTA Methods Perform?

Twenty‑two post‑hoc OOD detection methods were evaluated on the four ICONIC‑444 tasks. No single method dominates across all tasks and OOD types. Feature‑space approaches (GRAM, ViM, KNN, ATS) consistently outperform confidence‑based methods (MSP, MLS) and logit‑adjustment techniques (ASH, DICE).

The authors attribute this to ICONIC‑444’s clean, low‑variance feature space, where distance‑based metrics are more effective, whereas on noisy, diverse datasets like ImageNet, logit‑based adjustments tend to work better.

FPR95/FPR99 comparison across methods
FPR95/FPR99 comparison across methods

3.1 No Universal Champion

Feature‑space methods achieve lower false‑positive rates (FPR95, FPR99) than confidence‑based or logit‑adjusted methods, but performance varies with OOD difficulty.

3.2 Hard Cases for Current Methods

Even the best methods struggle with Near‑OOD and Far‑OOD samples, exhibiting high false‑positive rates. Examples include rye flakes being mis‑identified as almond shells and glass shards being classified as almond shells, highlighting the difficulty of distinguishing subtle texture and shape cues.

Hard OOD samples that confuse state‑of‑the‑art methods
Hard OOD samples that confuse state‑of‑the‑art methods

Paper: https://arxiv.org/abs/2601.10802

Dataset and code: https://github.com/gkrumpl/iconic-444

Code example

收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!
out‑of‑distributionAI safetyICONIC-444industrial datasetmachine learning benchmarkOOD detection
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.