Tagged articles
1 articles
Page 1 of 1
Meituan Technology Team
Meituan Technology Team
Jun 3, 2021 · Artificial Intelligence

VisTR: End-to-End Video Instance Segmentation with Transformers

VisTR redefines video instance segmentation as an end‑to‑end sequence‑to‑sequence task, using a CNN backbone, Transformer encoder‑decoder with instance queries, and Hungarian matching to jointly predict masks, classes, and tracks across frames, achieving state‑of‑the‑art accuracy (40.1 AP) and 57.7 FPS on YouTube‑VIS.

TransformerVideo Instance SegmentationVisTR
0 likes · 21 min read
VisTR: End-to-End Video Instance Segmentation with Transformers