Artificial Intelligence 10 min read

Introducing ICONIC-444: A 3.1M Industrial Image Dataset Redefining OOD Detection

The article presents ICONIC-444, a 3.1‑million‑image, 444‑class industrial dataset designed for out‑of‑distribution (OOD) detection, explains its realistic acquisition process, hierarchical OOD categories, benchmark tasks, and evaluates 22 state‑of‑the‑art OOD methods, revealing how dataset characteristics influence algorithm performance.

AI Frontier Lectures

Jan 21, 2026

Introducing ICONIC-444: A 3.1M Industrial Image Dataset Redefining OOD Detection

01 The Old Problem and New Solution for OOD Detection

Out‑of‑distribution (OOD) detection is essential for safety‑critical AI applications such as autonomous driving, medical diagnosis, and industrial quality inspection. Existing benchmarks rely on repurposed natural‑image datasets (CIFAR, ImageNet) that suffer from unrealistic scenarios, vague OOD definitions, data contamination, and limited scale.

02 ICONIC‑444: A Real‑World Benchmark for OOD Research

ICONIC‑444 (Image Classification and OOD Detection with Numerous Intricate Complexities) is a large‑scale industrial image dataset containing over 3.1 million RGB images across 444 categories. Images were captured with a prototype food‑sorting machine under controlled lighting and a uniform blue background. Rigorous cleaning removes blurry, duplicate, or mis‑labeled samples.

2.1 Industrial Origin Guarantees Realism

All images originate from a dedicated food‑sorting line, so OOD samples correspond to real contaminants on a production line. The uniform backdrop and fixed camera angles eliminate background noise, forcing models to focus on object features.

2.2 Hierarchical OOD Categories

Near‑OOD : Semantically close objects (e.g., other nuts when the ID task is almond classification).

Far‑OOD : Moderately distant objects (e.g., non‑food items such as glass shards).

Extreme‑OOD : Completely unrelated images sourced from external datasets like ImageNet or iNaturalist.

Synthetic‑OOD : Artificial patterns, solid colors, noise, or geometric shapes.

2.3 Four Benchmark Tasks

Almond : Fine‑grained classification of 7 almond variants.

Wheat : Highly fine‑grained classification of 12 wheat varieties.

Kernels : Medium‑granularity classification of 29 seed and grain types.

Food‑grade : Large‑scale, coarse‑grained classification of 324 food categories.

03 Benchmark Experiments: How Do SOTA Methods Perform?

Twenty‑two post‑hoc OOD detection methods were evaluated on the four ICONIC‑444 tasks. No single method dominates across all tasks and OOD types. Feature‑space approaches (GRAM, ViM, KNN, ATS) consistently outperform confidence‑based methods (MSP, MLS) and logit‑adjustment techniques (ASH, DICE).

The authors attribute this to ICONIC‑444’s clean, low‑variance feature space, where distance‑based metrics are more effective, whereas on noisy, diverse datasets like ImageNet, logit‑based adjustments tend to work better.

3.1 No Universal Champion

Feature‑space methods achieve lower false‑positive rates (FPR95, FPR99) than confidence‑based or logit‑adjusted methods, but performance varies with OOD difficulty.

3.2 Hard Cases for Current Methods

Even the best methods struggle with Near‑OOD and Far‑OOD samples, exhibiting high false‑positive rates. Examples include rye flakes being mis‑identified as almond shells and glass shards being classified as almond shells, highlighting the difficulty of distinguishing subtle texture and shape cues.

Hard OOD samples that confuse state‑of‑the‑art methods

Paper: https://arxiv.org/abs/2601.10802

Dataset and code: https://github.com/gkrumpl/iconic-444

Code example

收
藏
，
分
享
、
在
看
，
给
个
三
连
击呗！

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

out‑of‑distribution AI safety ICONIC-444 industrial dataset machine learning benchmark OOD detection

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.