Introducing ICONIC-444: A 3.1M Industrial Image Dataset Redefining OOD Detection
The article presents ICONIC-444, a 3.1‑million‑image, 444‑class industrial dataset designed for out‑of‑distribution (OOD) detection, explains its realistic acquisition process, hierarchical OOD categories, benchmark tasks, and evaluates 22 state‑of‑the‑art OOD methods, revealing how dataset characteristics influence algorithm performance.
01 The Old Problem and New Solution for OOD Detection
Out‑of‑distribution (OOD) detection is essential for safety‑critical AI applications such as autonomous driving, medical diagnosis, and industrial quality inspection. Existing benchmarks rely on repurposed natural‑image datasets (CIFAR, ImageNet) that suffer from unrealistic scenarios, vague OOD definitions, data contamination, and limited scale.
02 ICONIC‑444: A Real‑World Benchmark for OOD Research
ICONIC‑444 (Image Classification and OOD Detection with Numerous Intricate Complexities) is a large‑scale industrial image dataset containing over 3.1 million RGB images across 444 categories. Images were captured with a prototype food‑sorting machine under controlled lighting and a uniform blue background. Rigorous cleaning removes blurry, duplicate, or mis‑labeled samples.
2.1 Industrial Origin Guarantees Realism
All images originate from a dedicated food‑sorting line, so OOD samples correspond to real contaminants on a production line. The uniform backdrop and fixed camera angles eliminate background noise, forcing models to focus on object features.
2.2 Hierarchical OOD Categories
Near‑OOD : Semantically close objects (e.g., other nuts when the ID task is almond classification).
Far‑OOD : Moderately distant objects (e.g., non‑food items such as glass shards).
Extreme‑OOD : Completely unrelated images sourced from external datasets like ImageNet or iNaturalist.
Synthetic‑OOD : Artificial patterns, solid colors, noise, or geometric shapes.
2.3 Four Benchmark Tasks
Almond : Fine‑grained classification of 7 almond variants.
Wheat : Highly fine‑grained classification of 12 wheat varieties.
Kernels : Medium‑granularity classification of 29 seed and grain types.
Food‑grade : Large‑scale, coarse‑grained classification of 324 food categories.
03 Benchmark Experiments: How Do SOTA Methods Perform?
Twenty‑two post‑hoc OOD detection methods were evaluated on the four ICONIC‑444 tasks. No single method dominates across all tasks and OOD types. Feature‑space approaches (GRAM, ViM, KNN, ATS) consistently outperform confidence‑based methods (MSP, MLS) and logit‑adjustment techniques (ASH, DICE).
The authors attribute this to ICONIC‑444’s clean, low‑variance feature space, where distance‑based metrics are more effective, whereas on noisy, diverse datasets like ImageNet, logit‑based adjustments tend to work better.
3.1 No Universal Champion
Feature‑space methods achieve lower false‑positive rates (FPR95, FPR99) than confidence‑based or logit‑adjusted methods, but performance varies with OOD difficulty.
3.2 Hard Cases for Current Methods
Even the best methods struggle with Near‑OOD and Far‑OOD samples, exhibiting high false‑positive rates. Examples include rye flakes being mis‑identified as almond shells and glass shards being classified as almond shells, highlighting the difficulty of distinguishing subtle texture and shape cues.
Paper: https://arxiv.org/abs/2601.10802
Dataset and code: https://github.com/gkrumpl/iconic-444
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
