Artificial Intelligence 24 min read

How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide

This article explains the fundamentals of AI video technology, details the features of Alibaba Cloud's Tongyi Wanxiang platform, demonstrates how to use DeepSeek for script generation, and provides a complete workflow—including code examples—for producing high‑quality AI‑generated videos.

MaGe Linux Operations

Mar 28, 2025

How to Create AI-Generated Videos with Tongyi Wanxiang and DeepSeek: A Step‑by‑Step Guide

Preface

AI large‑model breakthroughs have energized the AI video field, turning what once seemed a daunting barrier into an accessible creative space; short videos have become indispensable in daily life, and this article uses the domestic model Tongyi Wanxiang as a concrete example.

AI Video Overview

What is AI video

AI video refers to the process and results of generating, editing, enhancing, or analyzing video content using artificial intelligence technologies. AI improves production efficiency and expands creative possibilities by leveraging machine learning, computer vision, and natural language processing.

Core characteristics of AI video

AI video’s core traits make AI technology in video more efficient, intelligent, and innovative:

Automation

Automatic clipping of video segments.

Automatic subtitle and dubbing generation.

Automatic content recognition and classification.

Intelligence

Object, scene, face, and action recognition.

Emotion tone analysis (e.g., joy, sadness, tension).

Personalized content recommendation based on user preferences.

Efficiency

Rapid generation of high‑quality video content.

Batch processing such as transcoding, compression, and enhancement.

Real‑time stream handling (e.g., live subtitles or effects).

Innovation

Generation of virtual characters or deep‑fake videos.

Creation of realistic effects and animations.

Conversion of text or images into dynamic video.

Personalization

Custom ad video generation for different users.

Interest‑based video recommendations.

Style‑specific video creation.

High quality

Resolution enhancement (e.g., low‑res to HD).

Restoration of old or damaged footage.

Automatic color, lighting, and stabilization adjustment.

AI video’s core features are automation, intelligence, efficiency, innovation, personalization, high quality, real‑time processing, and data‑driven capabilities, which are reshaping the entire video industry.

AI video application scenarios

AI video technology is applied across entertainment, education, commercial marketing, and more:

Entertainment & Social Media Users generate fun short videos for sharing, create dynamic covers, and attract attention with personalized content.

Content creation & Film production AI automates key scene detection, smart editing, virtual scene generation, and script drafting.

E‑commerce & Advertising AI analyzes user behavior to embed relevant ads, generates product showcase videos, and creates personalized ad content.

Education & Science Popularization AI produces animated science videos and historical reenactments to enhance learning engagement.

Short drama & Film Human‑AI co‑creation improves narrative tension and emotional expression, reducing production cycles.

Other innovative uses Real‑time subtitles, translation, virtual anchors, and background generation in live streams.

AI video applications are extensive, transforming creation, distribution, and consumption across many domains.

Tongyi Wanxiang Introduction

Overview

What is Tongyi Wanxiang

Tongyi Wanxiang is Alibaba Cloud’s multimodal content generation platform focused on intelligent image and video creation, built on the Tongyi large‑model family.

Core capabilities

Tongyi Wanxiang offers multiple AI generation abilities:

Text‑to‑image – generate images from textual descriptions in various artistic styles.

Image style transfer – transform uploaded images into a target style.

Video generation – supports text‑to‑video and image‑to‑video, producing cinematic‑grade HD videos with Chinese‑style optimization.

Similar image generation – create content or style‑similar artworks from an input picture.

Complex motion generation – simulate realistic physics for large‑scale motion scenes.

Technical characteristics

Key technical advantages include:

Based on Alibaba Tongyi large model – combines diffusion models and Transformer architecture for high‑quality output.

Multimodal support – handles both image and video generation.

High controllability – the Composer model enables fine‑grained control over color, layout, and style.

Chinese optimization – native support for long Chinese prompts, producing content aligned with Chinese culture and aesthetics.

Open‑source support – the 2.1 model is fully open‑source, with code and weights available on GitHub and HuggingFace.

These technical strengths make Tongyi Wanxiang a leading domestic AIGC platform for creators and enterprises.

Application scenarios

Typical use cases span:

Art creation – personalized artwork and style transfer for individuals and commercial projects.

Advertising & Marketing – generate product posters, ad videos, and personalized ad placements.

Film & Game Development – produce VFX, backgrounds, and game assets, accelerating production pipelines.

Social Media content – generate short videos and dynamic covers to boost engagement.

Commercial design & display – create product showcase videos and virtual store previews.

DeepSeek + Tongyi Wanxiang AI video workflow

DeepSeek advantages

DeepSeek delivers professional, in‑depth content generation, capable of producing high‑quality video scripts through advanced dialogue and reasoning.

DeepSeek entry

Tongyi Wanxiang video generation advantages

High‑quality video generation – produces cinematic‑grade HD (1080p) videos with smooth, realistic motion.

Chinese optimization & localization – accurately interprets long Chinese prompts and generates culturally appropriate content.

Complex motion & physics simulation – realistic rain, splashes, and other physical effects.

Multimodal generation – supports both text‑to‑video and image‑to‑video modes.

Rich visual effects – offers transition, particle, and artistic text effects.

Ease of use & high efficiency – intuitive interface lowers the creation barrier for all users.

Open‑source & ecosystem support – the 2.1 model and SDK are publicly available.

Operation process

Generate script with DeepSeek

Provide a prompt such as:

`我想做一个治愈系的名山大川的短视频，视频中的元素包括蔚蓝的天空，广阔的山河湖泊，飞鸟，无人机拍摄视角以及特写镜头，以国家地理纪录片的风格，时长30秒`

DeepSeek returns analysis, suggestions, and optimization points.

Further prompts refine the storyboard until satisfactory.

Generate video with Tongyi Wanxiang

Paste the storyboard into the video generation UI, adjust parameters such as aspect ratio, and click “Generate”.

After processing, the generated video appears on the right side and can be viewed or downloaded.

Following these steps completes a full AI video creation pipeline, which many creators adopt for efficient content production.

Other Tongyi Wanxiang features

Image‑to‑video

Upload an image; the system parses its elements and generates a corresponding video.

Text‑to‑image

Provide a descriptive prompt to generate images:

`生成一张猫和狗快乐玩耍的温馨图片，图中小猫伸出爪子去挠小狗的头，旁边有草坪，有几只蝴蝶，有盛开的花朵，蓝色的天空，风格为写实风格`

Java API integration

Import the SDK dependency, obtain an API key from Alibaba Cloud, and use the following code example:

package com.congge.chat;

// dashscope sdk >= 2.18.2
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;

public class Text2Video {
    public static void text2Video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param = VideoSynthesisParam.builder()
            .model("wanx2.1-t2v-turbo")
            .apiKey("你的apikey")
            .prompt("一只小猫在月光下奔跑")
            .size("1280*720")
            .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            text2Video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Running the program outputs a video link that can be opened in a browser and downloaded.

Conclusion

The article provides a comprehensive tutorial on leveraging Tongyi Wanxiang and DeepSeek to produce AI‑generated videos, covering workflow steps, platform features, application scenarios, and Java SDK integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI DeepSeek video synthesis AI video generation Java SDK Tongyi Wanxiang

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.