DataFunTalk
Dec 17, 2022 · Artificial Intelligence
Efficient Spatiotemporal Self‑Attention Transformer (Patch Shift Transformer) for Video Action Recognition
This article introduces a lightweight spatiotemporal self‑attention transformer, called Patch Shift Transformer, which achieves competitive video action recognition performance on datasets such as Kinetics‑400, Sth‑v1/v2, and Diving48 without increasing computational cost or parameters, and details its design, experiments, and speed advantages.
ECCV 2022patch shiftspatiotemporal modeling
0 likes · 5 min read