Create AI Videos with DeepSeek + Tongyi Wanxiang: Step-by-Step Guide

This article explains how to leverage the Chinese AI multimodal platform Tongyi Wanxiang together with DeepSeek to generate high-quality AI videos, covering AI video fundamentals, core features, application scenarios, detailed workflow, script creation, video synthesis, and Java API integration with code examples.

Raymond Ops
Raymond Ops
Raymond Ops
Create AI Videos with DeepSeek + Tongyi Wanxiang: Step-by-Step Guide

1. Introduction

The surge of large AI models has sparked interest in AI video technology. Previously, video production was seen as a high barrier, but the rise of AI models has driven innovation, making short video creation more accessible. This article uses the domestic large model Tongyi Wanxiang as a case study.

2. AI Video Overview

2.1 What Is AI Video?

AI video refers to the process and results of generating, editing, enhancing, or analyzing video content using artificial intelligence technologies. By leveraging machine learning, computer vision, and natural language processing, AI improves efficiency and expands creative possibilities.

2.2 Core Characteristics of AI Video

Automation

Automatic clipping of video segments.

Automatic subtitle and dubbing generation.

Automatic content recognition and classification.

Intelligence

Object, scene, face, and action recognition.

Emotion tone analysis (e.g., joy, sadness, tension).

Personalized content recommendation.

Efficiency

Rapid generation of high‑quality video content.

Batch processing (transcoding, compression, enhancement).

Real‑time streaming processing (live subtitles, effects).

Innovation

Generation of virtual characters or deepfake videos.

Creation of realistic effects and animation.

Conversion of text or images into dynamic video.

Personalization

Personalized ad videos for different users.

Recommendation of related videos based on interests.

Generation of videos matching specific styles or themes.

High Quality

Resolution enhancement (e.g., low‑res to HD).

Restoration of old or damaged footage.

Automatic color, lighting, and stabilization adjustments.

AI video’s core traits are automation, intelligence, efficiency, innovation, personalization, high quality, real‑time processing, and data‑driven insights , which are reshaping the entire video industry.

2.3 Application Scenarios

Entertainment & Social Media

Users generate fun short videos for sharing.

Dynamic covers and personalized clips boost engagement.

Content & Film Production

Intelligent editing and scene optimization.

AI‑generated virtual scenes and effects reduce costs.

Script generation from text descriptions.

E‑commerce & Advertising

Automatic ad insertion based on user behavior.

High‑quality product showcase videos.

Education & Science Communication

AI‑generated educational animations.

Dynamic teaching videos for history, science, etc.

Short Drama & Film

Human‑AI co‑creation improves narrative and emotion.

AI assists script, scene, effect, and editing.

Other Innovative Uses

Real‑time subtitles, translation, and virtual backgrounds in live streams.

Virtual anchors and AI digital humans.

AI video applications span entertainment, education, marketing, and more, fundamentally changing how video is created, distributed, and consumed.

3. Tongyi Wanxiang Overview

3.1 What Is Tongyi Wanxiang?

Tongyi Wanxiang is Alibaba Cloud’s AI multimodal content generation platform focused on image and video creation. Built on the Tongyi large‑model family, it offers efficient, innovative visual content generation.

Web entry: Tongyi Wanxiang AI Creative Drawing (Alibaba Cloud)

3.2 Core Features

Text‑to‑Image : Generates images from textual prompts in various artistic styles.

Image Style Transfer : Applies a chosen style to an uploaded image.

Video Generation : Supports text‑to‑video and image‑to‑video, producing cinematic‑grade HD videos with Chinese‑style optimization.

Similar Image Generation : Creates content or style‑similar artworks from an uploaded picture.

Complex Motion Generation : Simulates realistic physics for dynamic scenes.

3.3 Technical Characteristics

Based on Alibaba Tongyi Large Model : Combines diffusion models and Transformer architecture for high‑quality generation.

Multimodal Support : Handles both image and video creation.

High Controllability : Uses the Composer model to finely control color, layout, and style.

Chinese Optimization : Native support for long Chinese prompts, excelling in Chinese‑style video generation.

Open‑Source Support : Model 2.1 is fully open‑source on GitHub and HuggingFace, enabling developers to run text‑to‑video and image‑to‑video tasks.

Tongyi Wanxiang’s advantages lie in multimodal generation, Chinese optimization, high‑quality output, fine‑grained control, technical innovation, broad application scenarios, and open‑source ecosystem.

3.4 Application Scenarios

Art Creation : Personalized artwork and style transfer for creators.

Advertising & Marketing : Automated ad material generation and personalized ad placement.

Film & Game Development : Realistic VFX, scene, and character generation to accelerate production.

Social Media Content : Short video and dynamic cover creation for higher engagement.

Commercial Design : Product showcase videos and virtual store previews.

4. DeepSeek + Tongyi Wanxiang Video Production Workflow

4.1 Advantages

4.1.1 DeepSeek Strengths

DeepSeek provides professional, in‑depth content generation, including up‑to‑date knowledge via web search. It excels at producing detailed video scripts and storyboards, which can be fed into AI video platforms for rapid creation.

4.1.2 Tongyi Wanxiang Video Generation Strengths

High‑Quality Video : Generates cinematic‑grade HD videos (1080p) with smooth motion.

Chinese Optimization : Accurately interprets long Chinese prompts, producing culturally appropriate videos.

Complex Motion & Physics : Simulates realistic dynamics such as rain droplets or shattered glass.

Multimodal Generation : Supports both text‑to‑video and image‑to‑video modes.

Rich Visual Effects : Offers transition, particle, and artistic text effects.

Ease of Use : Simple UI lowers the creation barrier for non‑technical users.

Open‑Source Ecosystem : Model 2.1 is open‑source, encouraging community collaboration.

4.2 Operational Steps

4.2.1 Generate Video Script with DeepSeek

Provide the following prompt to DeepSeek to obtain a detailed script:

`我想做一个治愈系的名山大川的短视频,视频中的元素包括蔚蓝的天空,广阔的山河湖泊,飞鸟,无人机拍摄视角以及特写镜头,以国家地理纪录片的风格,时长30秒`

DeepSeek returns analysis, suggestions, and optimization points for the script.

图片
图片

Further refine the script by asking DeepSeek for a storyboard, which yields a detailed shot list.

图片
图片
图片
图片

If the initial script is unsatisfactory, iterate through additional dialogue until the desired result is achieved.

4.2.2 Generate Video with Tongyi Wanxiang

Paste the finalized storyboard into Tongyi Wanxiang’s video generation interface, adjust parameters such as aspect ratio (e.g., 3:4 for short‑form platforms), and click “Generate”. After processing, the generated video appears on the right side and can be previewed or downloaded.

图片
图片
图片
图片
图片
图片
Following these steps completes the end‑to‑end workflow from script generation to final video, which is the typical process many content creators use with AI models.

5. Additional Tongyi Wanxiang Features

5.1 Image‑to‑Video

Upload an image and let Tongyi Wanxiang automatically generate a video based on the visual content.

图片
图片

The resulting video depicts a flying object resembling a UFO in low altitude.

图片
图片

5.2 Text‑to‑Image (Creative Drawing)

Enter a detailed textual description to generate images. Example prompt:

`生成一张猫和狗快乐玩耍的温馨图片,图中小猫伸出爪子去挠小狗的头,旁边有草坪,有几只蝴蝶,有盛开的花朵,蓝色的天空,风格为写实风格`

The system returns four default images.

图片
图片

5.3 Java API Integration

Tongyi Wanxiang provides an API for programmatic access. Below is a sample Java snippet using the Alibaba DashScope SDK to invoke the video synthesis service.

package com.congge.chat;

// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;

public class Text2Video {
    /**   * Create a video compositing task and wait for the task to complete.   */
    public static void text2Video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param = VideoSynthesisParam.builder()
                .model("wanx2.1-t2v-turbo")
                .apiKey("你的apikey")
                .prompt("一只小猫在月光下奔跑")
                .size("1280*720")
                .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            text2Video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Running the code prints a JSON response containing the generated video URL, which can be opened in a browser and downloaded.

图片
图片
The Tongyi Wanxiang homepage offers many more features; explore them for further experimentation.

6. Conclusion

This article detailed the usage of Tongyi Wanxiang, demonstrated a complete AI video creation workflow with DeepSeek, and showed how to integrate the service via Java SDK. Thank you for reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Multimodal AIDeepSeekAI video generationJava SDKTongyi Wanxiang
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.