Artificial Intelligence 13 min read

Survey of AIGC Video Generation Algorithms

Since 2023, AI‑generated video research has expanded across six algorithmic categories—text‑to‑video, image‑to‑video, editing, style transfer, human motion, and long‑video generation—highlighting works such as CogVideo, Imagen Video, MagicVideo, ControlVideo, DCTNet, NUWA‑XL and OpenAI’s Sora, while analysis shows short‑clip diffusion models excel, editing remains costly, style transfer is efficient, and truly long, temporally consistent videos remain an open challenge.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Survey of AIGC Video Generation Algorithms

Since 2023, AIGC (AI‑generated content) has expanded from image synthesis to video generation. The video domain is still relatively empty, prompting many researchers to explore new algorithms.

Algorithm categories include:

Text‑to‑Video (e.g., CogVideo, Imagen Video, Text2Video‑Zero)

Image‑to‑Video (e.g., MagicVideo, AnimateDiff)

Video Editing (e.g., ControlVideo, Video‑P2P, Pix2Video)

Video Style Transfer (e.g., Rerender A Video, DCTNet)

Human Motion Generation (e.g., Follow Your Pose, DreamPose, MagicDance)

Long‑Video Generation (e.g., NUWA‑XL, Gen‑L‑Video, Sora)

Each entry lists the institution, release date and a brief description. Representative works are:

CogVideo (Tsinghua, 2022‑05‑29): two‑stage transformer (generation + frame‑interpolation) for text‑to‑video.

Imagen Video (Google, 2022‑10‑05): temporal extension of Imagen (closed‑source).

Text2Video‑Zero (Picsart AI Research, 2023‑03‑23): uses cross‑frame attention and saliency detection for zero‑shot video generation.

MagicVideo (ByteDance, 2023‑05‑11): extends Stable Diffusion to video by adding temporal information.

AnimateDiff (Shanghai AI Lab, 2023‑07‑11): trains a motion‑model on top of image diffusion models.

VideoCrafter1 (Tencent AI Lab, 2023‑10‑30): spatial‑temporal attention diffusion model.

ControlVideo (Huawei, 2023‑05‑22): training‑free controllable text‑to‑video generation.

Rerender A Video (NTU, 2023‑12‑17): SD + ControlNet with cross‑frame attention for video style transfer.

DCTNet (Alibaba DAMO, 2022‑07‑06): GAN‑based video stylization supporting seven styles.

NUWA‑XL (Microsoft Research Asia, 2023‑03‑22): diffusion‑over‑diffusion for extremely long video generation.

Sora (OpenAI, 2024‑02): proprietary text‑to‑video model achieving high‑quality long videos.

Analysis :

• Text/Image‑to‑Video methods (e.g., AnimateDiff, VideoCrafter, Stable Video Diffusion) produce high‑quality short clips (typically ≤2 s). • Video editing approaches (ControlVideo, Video‑P2P) offer flexible attribute manipulation but are computationally heavy and limited in duration. • Style‑transfer methods (Rerender A Video, DCTNet) are faster and require less memory; DCTNet provides stable results with low GPU demand. • Human motion generation (Animate Anyone, DreamMoving, MagicDance) shows promising interactive applications, though most top‑performing models are not open‑source. • Long‑video generation remains challenging; Sora demonstrates the potential but is not publicly released, and most diffusion‑based methods still generate only a few seconds of video.

Overall, the field is rapidly evolving, with diffusion models and transformer‑based architectures driving most advances. Future work will likely focus on extending video length, improving temporal consistency, and releasing more open‑source implementations.

AIvideo generationAIGCdiffusion modelsimage-to-videotext-to-videoVideo Editing
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.