Artificial Intelligence 19 min read

2020 Computer Vision Breakthroughs: Self‑Supervised Learning, Transformer Attention Modeling, and Neural Radiance Fields

The talk reviews three major 2020 advances in computer vision—self‑supervised learning surpassing supervised pre‑training, the successful adoption of Transformer‑based attention models for detection and classification, and the emergence of Neural Radiance Fields for view synthesis—while highlighting related research from Microsoft Research Asia and the broader community.

DataFunTalk

Apr 10, 2021

2020 Computer Vision Breakthroughs: Self‑Supervised Learning, Transformer Attention Modeling, and Neural Radiance Fields

In this presentation, Hu Han (MSRA) and editor Zhu Yushi introduce the most impactful computer‑vision research of 2020, focusing on three breakthroughs: self‑supervised learning, Transformer‑based attention modeling, and Neural Radiance Fields (NeRF).

1. Self‑Supervised Learning – 2020 saw the first self‑supervised methods (MoCo, SimCLR) outperform supervised pre‑training on downstream tasks, marking a milestone. The importance of self‑supervision is illustrated by Yann LeCun’s “cake” analogy and its relevance to human infant learning. The traditional supervised pre‑training + fine‑tuning paradigm is described, followed by the emergence of self‑supervised pre‑training + fine‑tuning, exemplified by MoCo’s success across seven downstream tasks.

Subsequent developments include PIC (a single‑branch unsupervised feature learner) and PixPro (pixel‑level self‑supervision), which improve dense prediction tasks such as object detection and segmentation. PixPro introduces pixel smoothing and removes the contrastive branch, achieving notable gains on Pascal VOC and other benchmarks.

2. Transformer Attention Modeling in Vision – Transformers, originally dominant in NLP, were successfully applied to vision in 2020 through works like DETR (end‑to‑end object detection) and Vision Transformer (ViT) for image classification. RelationNet++ uses a Transformer decoder to fuse multiple object‑representation schemes, achieving 52.7% mAP on COCO. The talk also surveys earlier attention‑based works (NLNet, non‑local networks) and recent advances that replace or complement convolutions with attention mechanisms for pixel‑pixel, object‑object, and object‑pixel relationships.

Additional topics cover video‑based pre‑training, multimodal self‑supervision, and the unification of CV and NLP modeling via Transformers, highlighting the shift toward a common modeling framework across modalities.

3. Neural Radiance Fields (NeRF) – NeRF is presented as a landmark achievement for low‑level vision, enabling high‑quality view synthesis by representing scenes as continuous radiance fields.

The presentation concludes that computer vision is entering an era dominated by self‑supervised and attention‑based models, which are poised to unify visual and language understanding.

Overall, the talk emphasizes that computer vision is moving toward self‑supervised and Transformer‑based approaches, which are likely to become the unified modeling paradigm for both vision and language tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Computer Vision Transformer self-supervised learning 2020 AI breakthroughs

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.