FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

FancyVideo is an open‑source UNet‑based video generation model that supports arbitrary resolutions, aspect ratios, styles, and motion dynamics by introducing a Cross‑frame Textual Guidance Module (CTGM) with temporal injectors, refiners, and boosters, achieving state‑of‑the‑art results on multiple benchmarks and enabling versatile applications such as video extension, backtracking, and frame interpolation.

AI researchUNetcross-frame guidance

0 likes · 6 min read

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

AntTech

Dec 20, 2022 · Artificial Intelligence

Towards Smooth Video Composition: A New Benchmark for GAN‑Based Video Generation

Researchers from multiple institutions propose a GAN‑based video generation framework that explicitly models short‑, medium‑, and long‑range temporal relations, introduces B‑spline motion embeddings and temporal shift modules, and demonstrates substantial quality improvements across several video datasets.

B-splineGANStyleGAN-V

0 likes · 7 min read

Towards Smooth Video Composition: A New Benchmark for GAN‑Based Video Generation

Kuaishou Audio & Video Technology

Apr 22, 2022 · Artificial Intelligence

How Temporal Residual Modeling Boosts Video Super‑Resolution Performance

This article introduces a novel video super‑resolution framework that unifies low‑ and high‑resolution temporal modeling using adjacent‑frame residual maps, achieving state‑of‑the‑art results on multiple benchmarks while maintaining high speed and flexibility.

residual mapstemporal modelingvideo super-resolution

0 likes · 14 min read

How Temporal Residual Modeling Boosts Video Super‑Resolution Performance

NetEase Media Technology Team

Jul 24, 2020 · Artificial Intelligence

Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training

This survey reviews video action recognition, comparing 3D convolutional networks that jointly model spatial‑temporal cues but are computationally heavy with 2D‑based approaches like TSM and TIN that embed temporal shifts efficiently, and emphasizes how large‑scale pre‑training markedly improves performance despite limited labeled data.

2D convolutional networks3D convolutional networksComputer Vision

0 likes · 13 min read

Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training