Tagged articles
4 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Jun 23, 2023 · Artificial Intelligence

Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications

This article introduces video action recognition, covering its basic definition, downstream tasks, major algorithmic families—including CNN‑based, Vision‑Transformer, self‑supervised, and multimodal approaches—and discusses practical deployment scenarios and open challenges in the field.

CNNmultimodal modelsself-supervised learning
0 likes · 16 min read
Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Efficient Spatiotemporal Self‑Attention Transformer (Patch Shift Transformer) for Video Action Recognition

This article introduces a lightweight spatiotemporal self‑attention transformer, called Patch Shift Transformer, which achieves competitive video action recognition performance on datasets such as Kinetics‑400, Sth‑v1/v2, and Diving48 without increasing computational cost or parameters, and details its design, experiments, and speed advantages.

ECCV 2022Transformerpatch shift
0 likes · 5 min read
Efficient Spatiotemporal Self‑Attention Transformer (Patch Shift Transformer) for Video Action Recognition
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 28, 2021 · Artificial Intelligence

How Alibaba Cloud’s MMAI Team Dominated CVPR2021 Video Action Challenges

Alibaba Cloud’s Multimedia AI team won five first‑place titles and one runner‑up across six major video‑action challenges at CVPR2021, showcasing advanced transformer‑CNN hybrids, self‑supervised initialization, and spatio‑temporal relation modeling that now power their multimedia AI cloud products.

Alibaba CloudCVPR2021multimedia AI
0 likes · 14 min read
How Alibaba Cloud’s MMAI Team Dominated CVPR2021 Video Action Challenges
NetEase Media Technology Team
NetEase Media Technology Team
Jul 24, 2020 · Artificial Intelligence

Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training

This survey reviews video action recognition, comparing 3D convolutional networks that jointly model spatial‑temporal cues but are computationally heavy with 2D‑based approaches like TSM and TIN that embed temporal shifts efficiently, and emphasizes how large‑scale pre‑training markedly improves performance despite limited labeled data.

2D convolutional networks3D convolutional networksComputer Vision
0 likes · 13 min read
Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training