Jan 13, 2025 · Artificial Intelligence

Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation

The paper proposes MVFormer, a Vision Transformer that combines a Multi‑View Normalization (MVN) module and a Multi‑View Token Mixer (MVTM) to diversify feature learning, achieving state‑of‑the‑art Top‑1 accuracy of 83.4%‑84.6% on ImageNet‑1K and superior performance on COCO detection and ADE20K segmentation while using comparable or fewer parameters and MACs.

Deep LearningMulti-View NormalizationToken Mixer

0 likes · 25 min read

Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation