How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

Model Release

Last night, the 14‑billion‑parameter Tongyi Wanxiang end‑to‑end video model was open‑sourced.

Key Capabilities

As the industry’s first open‑source model of this scale, it can generate 720p videos that smoothly connect user‑provided start and end images, enabling more controllable and customized video creation such as timelapse and transformation effects.

Users simply upload two pictures, and the model can produce complex, personalized videos, including subject‑specific visual effects and scene‑changing camera motions.

By entering a prompt, users can further control camera movements like rotation, pan, and zoom, ensuring visual richness while keeping consistency with the reference images.

Access and Deployment

The model can be tried for free on the official website, or downloaded from GitHub, Hugging Face, or ModelScope for local deployment and further development.

Technical Details

The model builds on the Wan2.1 text‑to‑video architecture and adds extra conditional control mechanisms for precise end‑frame transitions. During training, dedicated end‑frame data and parallel strategies for text‑video encoding and diffusion modules were employed, boosting efficiency and high‑resolution output.

Since the release of Wan2.1 in February, the models have topped Hugging Face rankings, earned over 10k GitHub stars, and amassed more than 2.2 million downloads, making them among the most popular open‑source large models.

Links

https://tongyi.aliyun.com/wanxiang/videoCreation

https://github.com/Wan-Video/Wan2.1

https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P

https://www.modelscope.cn/models/Wan-AI/Wan2.1-FLF2V-14B-720P

Model Experience

Model screenshot
Model screenshot

Technical Analysis

Technical diagram
Technical diagram

Effect Demonstration

Demo gif 1
Demo gif 1
Demo gif 2
Demo gif 2
Demo gif 3
Demo gif 3
Demo gif 4
Demo gif 4
Demo gif 5
Demo gif 5
Demo gif 6
Demo gif 6
Demo gif 7
Demo gif 7
Demo gif 8
Demo gif 8
Demo gif 9
Demo gif 9
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionDeep LearningVideo Generationopen sourceAI modeltext-to-video
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.