ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning

JD Research Institute and the University of Sydney introduced ViTAEv2, a 600‑million‑parameter deep learning model that achieved a world‑leading 91.2% top‑1 accuracy on ImageNet Real without external data, demonstrating strong few‑shot learning, reducing labeling costs, and promising advances across many computer‑vision tasks.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning

JD Research Institute, in collaboration with the University of Sydney, announced the super deep learning model ViTAEv2, designed for larger scale, better performance, and broader adaptability across visual tasks.

ViTAEv2, with 600 million parameters and no reliance on external data, achieved a world‑leading 91.2% top‑1 accuracy on the ImageNet Real classification benchmark, setting a new record in image‑classification technology.

ImageNet, the largest public image‑classification dataset, has long served as a key metric for evaluating computer‑vision progress, attracting top tech companies and leading universities.

Computer vision, a core AI technology, aims to give machines the ability to observe, perceive, and understand images; image classification is its fundamental task. ViTAEv2 adopts a pre‑training‑fine‑tuning paradigm, leveraging inductive bias in large models and matching pre‑training and transfer‑learning algorithms to achieve its results.

The researchers also examined ViTAEv2’s few‑shot learning capability by fine‑tuning the model with 1 %, 10 %, and 100 % of the data. Even with only 10 % of the data, the large model outperformed smaller models trained on the full dataset, confirming strong representation, learning, and sample‑efficiency properties.

This demonstrates ViTAEv2’s potential to tackle low‑resource or zero‑resource tasks, lower data‑annotation costs, accelerate algorithm development cycles, simplify model deployment, and empower next‑generation automated learning technologies.

Looking ahead, ViTAEv2 is expected to drive progress in a range of visual tasks such as semantic segmentation, object detection, pose estimation, and video object segmentation, while future research will focus on improving performance and reducing training and inference costs through better training methods and model architecture design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionDeep LearningAI modelFew‑Shot LearningImageNetViTAEv2
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.