Artificial Intelligence 5 min read

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

Alibaba Cloud Developer

Apr 18, 2025

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

Model Release

Last night, the 14‑billion‑parameter Tongyi Wanxiang end‑to‑end video model was open‑sourced.

Key Capabilities

As the industry’s first open‑source model of this scale, it can generate 720p videos that smoothly connect user‑provided start and end images, enabling more controllable and customized video creation such as timelapse and transformation effects.

Users simply upload two pictures, and the model can produce complex, personalized videos, including subject‑specific visual effects and scene‑changing camera motions.

By entering a prompt, users can further control camera movements like rotation, pan, and zoom, ensuring visual richness while keeping consistency with the reference images.

Access and Deployment

The model can be tried for free on the official website, or downloaded from GitHub, Hugging Face, or ModelScope for local deployment and further development.

Technical Details

The model builds on the Wan2.1 text‑to‑video architecture and adds extra conditional control mechanisms for precise end‑frame transitions. During training, dedicated end‑frame data and parallel strategies for text‑video encoding and diffusion modules were employed, boosting efficiency and high‑resolution output.

Since the release of Wan2.1 in February, the models have topped Hugging Face rankings, earned over 10k GitHub stars, and amassed more than 2.2 million downloads, making them among the most popular open‑source large models.

Links

https://tongyi.aliyun.com/wanxiang/videoCreation

https://github.com/Wan-Video/Wan2.1

https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P

https://www.modelscope.cn/models/Wan-AI/Wan2.1-FLF2V-14B-720P

Model Experience

Technical Analysis

Effect Demonstration

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision Deep Learning video generation Open-source AI model text-to-video

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.