Tagged articles

visual foundation model

4 articles · Page 1 of 1

Jul 15, 2026 · Artificial Intelligence

DeepMind’s Video Generation Model Becomes a General Visual Intelligence – He Kaiming’s Involvement

GenCeption repurposes a 140‑billion‑parameter text‑to‑video diffusion model into a single‑step feed‑forward visual system that handles depth, segmentation, pose and other tasks via text prompts, achieves state‑of‑the‑art results with far fewer training frames, and demonstrates strong out‑of‑domain generalisation using synthetic data.

GenCeptionmultitask visionsynthetic data

0 likes · 10 min read

DeepMind’s Video Generation Model Becomes a General Visual Intelligence – He Kaiming’s Involvement

DataFunTalk

Sep 29, 2025 · Artificial Intelligence

How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP

In an interview before the DACon conference, Dr. Feng Ziyong reveals how Glint‑MVT and novel data‑synthesis techniques overcome distribution gaps, improve compositional understanding, and enable billion‑scale, second‑level retrieval for city‑level surveillance, while balancing model efficiency and effectiveness.

Embedding RetrievalModel distillationcity surveillance

0 likes · 11 min read

How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

DINOv3Gram AnchoringLarge‑Scale Training

0 likes · 8 min read

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

AIWalker

May 12, 2025 · Artificial Intelligence

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

DefMamba introduces a multi‑scale backbone, deformable Mamba modules, and a dynamic scanning strategy to preserve image spatial structure, achieving state‑of‑the‑art performance on image classification, object detection, and semantic segmentation benchmarks.

DefMambaSemantic Segmentationcomputer vision

0 likes · 23 min read

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks