Tagged articles

mask memory

1 articles · Page 1 of 1

May 15, 2026 · Artificial Intelligence

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

X2SAM is a unified multimodal large model that combines image and video segmentation with language and visual prompts, introduces a Mask Memory for temporal consistency, defines a new V‑VGD task, and achieves state‑of‑the‑art results while cutting training cost by over 30%.

Large Language ModelV-VGDX2SAM

0 likes · 9 min read