How Alibaba Cloud’s Open‑Source Wan 2.1 Sets New Benchmarks in Video Generation
Alibaba Cloud’s newly open‑sourced visual generation model Wan 2.1 achieves a VBench score of 86.22%, outperforms leading models, runs on consumer‑grade GPUs with only 8.2 GB VRAM, and supports multi‑task video creation, marking a significant step for open‑source video AI.
Release Overview
On 2025‑02‑25 Alibaba Cloud released the open‑source visual generation foundation model Wan 2.1. The source code, model weights and inference scripts are hosted at https://github.com/Wan-Video/Wan2.1.
Model Scale and Benchmark
Wan 2.1 is offered in a professional version with 140 billion parameters. On the VBench benchmark it achieves an overall score of 86.22 %, outperforming Sora, Luma and Pika across motion quality, visual fidelity, style consistency and multi‑objective handling.
Hardware Requirements
The 1.3 billion‑parameter variant can generate 480 p video using ≤8.2 GB VRAM, allowing execution on consumer‑grade GPUs. On an NVIDIA RTX 4090 a 5‑second 480 p clip is produced in approximately 4 minutes without quantization or additional optimizations.
Supported Tasks
Wan 2.1 provides a unified interface for:
Text‑to‑video
Image‑to‑video
Video editing (in‑place modification)
Text‑to‑image
Video‑to‑audio generation
Architecture
The model builds on the DiT (Diffusion Transformer) backbone and adopts a linear‑noise‑trajectory Flow Matching training paradigm. A causal 3D VAE encodes video frames into a latent space, and a feature‑cache mechanism enables arbitrary‑length video encoding and decoding.
Training Pipeline
Data preparation follows a four‑step cleaning pipeline: (1) collection of large‑scale image and video datasets, (2) duplicate detection, (3) quality filtering using automated metrics, and (4) final curation for diversity. Training employs distributed strategies to reduce memory consumption and increase throughput, including Fully Sharded Data Parallel (FSDP), RingAttention and the Ulysses optimizer.
Open‑Source Impact
By releasing the code and weights, Alibaba Cloud invites researchers and developers to reproduce, extend, and integrate Wan 2.1 into downstream applications, facilitating further research on multimodal video generation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Product Manager Community
A cutting‑edge think tank for AI product innovators, focusing on AI technology, product design, and business insights. It offers deep analysis of industry trends, dissects AI product design cases, and uncovers market potential and business models.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
