Machine Heart
May 15, 2026 · Artificial Intelligence
How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level
X2SAM is a unified multimodal large model that combines image and video segmentation with language and visual prompts, introduces a Mask Memory for temporal consistency, defines a new V‑VGD task, and achieves state‑of‑the‑art results while cutting training cost by over 30%.
V-VGDX2SAMcomputer vision
0 likes · 9 min read
