Tagged articles

ImageNet

18 articles · Page 1 of 1

May 30, 2026 · Artificial Intelligence

Fei‑Fei Li’s Team Unveils GPIC: A 100‑Million‑Pair Image‑Text Corpus to Supersede ImageNet

The article explains why ImageNet has become obsolete for visual generation, introduces the newly released GPIC dataset of 100 million image‑text pairs with 28 trillion pixels, describes its four‑stage construction pipeline, new FD‑DINOv2 evaluation metric, and a reference baseline model, positioning GPIC as the next common benchmark for the field.

AI evaluationFD-DINOv2Fei-Fei Li

0 likes · 10 min read

Fei‑Fei Li’s Team Unveils GPIC: A 100‑Million‑Pair Image‑Text Corpus to Supersede ImageNet

AI Explorer

Apr 24, 2026 · Artificial Intelligence

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Google’s newly unveiled “Banana” visual Transformer introduces dynamic sparse attention that cuts inference cost 3‑5×, reduces memory by 70%, and improves ImageNet accuracy, while demonstrating real‑world gains in autonomous driving, medical imaging, and satellite analysis.

Dynamic Sparse AttentionGoogleImageNet

0 likes · 6 min read

Google’s ‘Banana’ Model Redefines Visual Transformers with Dynamic Sparse Attention

Machine Heart

Apr 24, 2026 · Artificial Intelligence

Generating High‑Resolution Images with Only 64 Tokens: How MacTok Overcomes Posterior Collapse

MacTok introduces semantic masking and dual‑space alignment to prevent posterior collapse in continuous image tokenizers, enabling high‑quality generation with just 64‑128 tokens and achieving strong gFID scores on ImageNet at 256×256 and 512×512 resolutions.

ImageNetMacTokcontinuous tokenizer

0 likes · 9 min read

Generating High‑Resolution Images with Only 64 Tokens: How MacTok Overcomes Posterior Collapse

AIWalker

Mar 21, 2026 · Artificial Intelligence

Re‑annotating ImageNet: 1.28 M Images Gain Multi‑Labels, Boosting COCO mAP by 4 Points

A Rochester research team automatically relabeled the entire 1.28 M‑image ImageNet training set with multi‑labels using self‑supervised object discovery and a lightweight region classifier, resulting in a pretrained model that raises COCO mAP by 4.2 points and VOC mAP by 2.3 points.

ImageNetdataset relabelingmodel performance

0 likes · 6 min read

Re‑annotating ImageNet: 1.28 M Images Gain Multi‑Labels, Boosting COCO mAP by 4 Points

AIWalker

Mar 4, 2026 · Artificial Intelligence

Drifting Models Enable One‑Step Generation, Shattering Speed Records

The paper introduces Drifting Models, a new generative paradigm that moves the distribution evolution to the training phase, achieving true one‑step (1‑NFE) generation with state‑of‑the‑art ImageNet FID scores of 1.54 in latent space and 1.61 in pixel space, while eliminating the need for distillation or classifier‑free guidance.

Drifting ModelsImageNetOne-step Generation

0 likes · 24 min read

Drifting Models Enable One‑Step Generation, Shattering Speed Records

Machine Learning Algorithms & Natural Language Processing

Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchDiffusion ModelsImageNet

0 likes · 6 min read

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

AI Frontier Lectures

Feb 3, 2026 · Artificial Intelligence

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

The Pixel Mean Flow (pMF) method eliminates multi‑step sampling and latent‑space encoding, generating high‑quality images in a single step and achieving state‑of‑the‑art FID scores on ImageNet while drastically reducing computational cost.

Diffusion ModelsImageNetperceptual loss

0 likes · 7 min read

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

Data Party THU

Nov 11, 2025 · Artificial Intelligence

Why Early Adversarial Attacks Still Beat Modern Ones: A Fair Transferability Study

This paper systematically evaluates 23 transferable adversarial attacks and 11 defenses on ImageNet, revealing that early methods like DI outperform many newer attacks when hyper‑parameters are fairly matched, that diffusion‑based defenses give a false sense of security, and that higher transferability often comes at the cost of reduced stealthiness.

ImageNetadversarial attacksdeep learning security

0 likes · 8 min read

Why Early Adversarial Attacks Still Beat Modern Ones: A Fair Transferability Study

AI Frontier Lectures

Oct 29, 2025 · Artificial Intelligence

Why Early DI Attacks Outperform Modern Methods: A Systematic Study of Transferable Adversarial Images

This paper systematically evaluates 23 transferable adversarial attacks and 11 defenses on ImageNet, revealing that early DI attacks surpass newer methods when hyper‑parameters are fairly set, diffusion defenses offer false security, and higher transferability often reduces stealthiness, urging fair benchmarking and comprehensive metrics.

ImageNetadversarial attacksdeep learning robustness

0 likes · 7 min read

Why Early DI Attacks Outperform Modern Methods: A Systematic Study of Transferable Adversarial Images

Data Party THU

Sep 20, 2025 · Artificial Intelligence

How Mamba-Adaptor Revives State‑Space Models for Vision Tasks

The Mamba-Adaptor introduces a dual‑module adapter that overcomes causal computation limits, long‑range memory decay, and spatial structure loss in state‑space models, delivering state‑of‑the‑art results on ImageNet, COCO, and various downstream visual tasks with minimal overhead.

AdapterCOCOImageNet

0 likes · 8 min read

How Mamba-Adaptor Revives State‑Space Models for Vision Tasks

AI Frontier Lectures

May 27, 2025 · Artificial Intelligence

Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow

The article presents MeanFlow, a novel one‑step generative modeling framework that replaces instantaneous velocity with an average‑velocity field, achieving a record‑low FID of 3.43 on ImageNet 256×256 with a single function evaluation and outperforming both prior single‑step and multi‑step diffusion models.

AI researchFIDImageNet

0 likes · 7 min read

Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow

DataFunTalk

Jun 14, 2024 · Artificial Intelligence

Midjourney’s Diverse Data Sources: Public Datasets, Academic Research, Partner and Proprietary Data

Midjourney enhances its AI models by integrating a wide range of data sources—including public datasets like ImageNet and COCO, academic research from top conferences, partner collaborations, and its own proprietary data—while continuously updating and managing these datasets for quality, privacy, and security.

AI trainingBright DataCOCO

0 likes · 9 min read

Midjourney’s Diverse Data Sources: Public Datasets, Academic Research, Partner and Proprietary Data

DataFunSummit

Jun 13, 2024 · Artificial Intelligence

Midjourney’s Data Sources: Public Datasets, Academic Research, Partner Data, and Proprietary Data

Midjourney leverages a wide range of data sources—including public datasets like ImageNet and COCO, academic research from top conferences and journals, partner collaborations, and its own proprietary data—augmented by real‑time feeds from Bright Data, to continuously improve and expand its AI models.

AIBright DataCOCO

0 likes · 13 min read

Midjourney’s Data Sources: Public Datasets, Academic Research, Partner Data, and Proprietary Data

Meituan Technology Team

Mar 24, 2022 · Artificial Intelligence

Twins: Efficient Visual Attention Models for Vision Transformers

The Twins series, a collaboration between Meituan and the University of Adelaide, introduces conditional positional encoding and spatially separable self‑attention to improve efficiency and performance of vision transformers, achieving state‑of‑the‑art results on ImageNet, ADE20K, COCO and high‑precision map segmentation.

ADE20KCOCOConditional Positional Encoding

0 likes · 20 min read

Twins: Efficient Visual Attention Models for Vision Transformers

JD Cloud Developers

Mar 21, 2022 · Artificial Intelligence

ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning

JD Research Institute and the University of Sydney introduced ViTAEv2, a 600‑million‑parameter deep learning model that achieved a world‑leading 91.2% top‑1 accuracy on ImageNet Real without external data, demonstrating strong few‑shot learning, reducing labeling costs, and promising advances across many computer‑vision tasks.

AI modelImageNetViTAEv2

0 likes · 4 min read

ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning

Tencent Tech

Aug 26, 2020 · Artificial Intelligence

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

GPUImageNetTencent Cloud

0 likes · 9 min read

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

21CTO

Sep 14, 2018 · Artificial Intelligence

From Stanford to Google: How Fei‑Fei Li Built ImageNet and Shaped AI

Fei‑Fei Li, the pioneering AI researcher and former Google Cloud AI lead, rose from humble beginnings in China to create the ImageNet dataset, drive breakthroughs in computer vision, and now returns to Stanford, illustrating how curiosity and perseverance can transform both academia and industry.

Fei-Fei LiGoogle AIImageNet

0 likes · 12 min read

From Stanford to Google: How Fei‑Fei Li Built ImageNet and Shaped AI

Tencent Architect

Jul 30, 2018 · Artificial Intelligence

Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record

Tencent’s intelligent machine‑learning platform achieved a world‑record by training AlexNet in 4 minutes and ResNet‑50 in 6.6 minutes on ImageNet, using large batch sizes, mixed‑precision, LARS optimization, hierarchical synchronization, gradient fusion, and pipeline I/O techniques to overcome accuracy and scalability challenges.

AI accelerationImageNetdeep learning

0 likes · 24 min read

Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record