AIWalker
Jan 13, 2025 · Artificial Intelligence
Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation
The paper proposes MVFormer, a Vision Transformer that combines a Multi‑View Normalization (MVN) module and a Multi‑View Token Mixer (MVTM) to diversify feature learning, achieving state‑of‑the‑art Top‑1 accuracy of 83.4%‑84.6% on ImageNet‑1K and superior performance on COCO detection and ADE20K segmentation while using comparable or fewer parameters and MACs.
Multi-View NormalizationToken Mixercomputer vision
0 likes · 25 min read
