How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers
EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.
EasyAnimate v3 Highlights
Alibaba Cloud AI Platform PAI releases EasyAnimate v3, an open‑source video generation system built on a Diffusion Transformer (DiT) backbone with a T5 text encoder.
Generate videos from a single image or an image‑plus‑text prompt.
Create videos using two images as start and end frames.
Supports up to 720p (960×960) resolution at 144 fps.
Runs on as little as 12 GB VRAM (e.g., RTX 3060 12 GB).
Provides unlimited‑length video continuation.
Effect demonstration:
Model Architecture
The core model adopts the Diffusion Transformer (DiT) architecture with T5 as the text encoder. It incorporates a Hybrid Motion Module: even layers add temporal‑aware attention to learn sequence information, while odd layers apply global attention over space‑time to enlarge the receptive field.
Inspired by U‑ViT, the network inserts skip connections with a zero‑initialized linear layer, allowing the module to be plugged into a pre‑trained DiT.
Slice VAE
Slice VAE provides a 1/4 temporal compression rate and supports different processing strategies for video frames versus images. For video‑frame input (e.g., 512×512×8), it compresses to a 64×64×2 latent; for a single image (512×512) it compresses to 64×64×1.
Image‑to‑Video Pipeline
Both the region to be reconstructed and the reference image are encoded by Slice VAE, concatenated with a randomly initialized latent, and fed into DiT for noise prediction. For text prompts, a CLIP Image Encoder encodes the input image, projects it, and concatenates with T5‑encoded text; the combined representation undergoes cross‑attention in DiT.
Project resources include the GitHub repository, the technical report (arXiv:2405.18991), and a quick‑start page on the PAI console.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
