Jan 11, 2025 · Artificial Intelligence

CAS-ViT: The Fastest, Strongest Vision Transformer for Mobile Image Classification & Detection

CAS‑ViT introduces a convolutional additive self‑attention mechanism that dramatically reduces the computational cost of Vision Transformers, achieving state‑of‑the‑art accuracy on image classification, object detection, and segmentation while being deployable on mobile devices.

Efficient ModelsSelf-AttentionVision Transformer

0 likes · 19 min read

CAS-ViT: The Fastest, Strongest Vision Transformer for Mobile Image Classification & Detection