Is CLIP Obsolete? LeCun and Xie's New Multimodal Model Beats Language Supervision
A recent study by LeCun, Xie, and collaborators shows that large‑scale visual self‑supervised learning (Web‑SSL) can match or surpass CLIP on diverse VQA tasks, even without any language supervision, by scaling model size and data volume.
