How to Build a Text‑to‑Video Workflow in Dify Using LLMs

This guide walks you through creating a Dify workflow that turns user prompts into videos by chaining LLM‑generated descriptions with a Text‑to‑Video model, covering workflow types, system variables, model setup, node configuration, plugin installation, and final testing steps.

Instant Consumer Technology Team
Instant Consumer Technology Team
Instant Consumer Technology Team
How to Build a Text‑to‑Video Workflow in Dify Using LLMs

Introduction

Workflows break complex tasks into smaller steps, reducing reliance on prompt engineering and improving performance, interpretability, stability, and fault tolerance of LLM applications.

Dify Workflow Classification

Dify offers two workflow types:

Chatflow : for conversational scenarios such as customer service, semantic search, or any multi‑step dialogue‑based app.

Workflow : for automation and batch processing, suitable for high‑quality translation, data analysis, content generation, email automation, etc.

For a text‑to‑video use case, the Workflow type is chosen.

Creating the Workflow

The workflow consists of the following steps:

User inputs a keyword (prompt).

LLM expands the user input into a richer description.

TEXT TO VIDEO receives the LLM output and generates a video.

LLM2 refines the TEXT TO VIDEO output text (to control the final result).

Workflow ends with LLM2 output.

Example input prompt: 小猫游泳.

System Variables

Variable Name

Data Type

Description

Notes

sys.files

Array[File]

Files uploaded by the user.

Enable file upload in the app’s feature settings.

sys.user_id

String

Unique identifier for each user.

sys.app_id

String

Unique identifier for the application.

sys.workflow_id

String

Identifier for the workflow, tracking node information.

sys.workflow_run_id

String

Run identifier for the workflow execution.

Step‑by‑Step Setup

1. Add a Model

In the Studio menu, click the avatar, select Settings, and add a model provider (e.g., Tongyi Qianwen, SiliconFlow). Remember to provide the model’s API‑KEY.

2. Create an Application

Navigate to “Studio → Create Blank App”, choose “Workflow”, name it (e.g., “Text‑to‑Video Workflow”), and open the workflow editor.

3. Add User Input Variable

Click the plus button, add a variable of type Text named query with a max length of 256.

{
  "query": "小猫游泳",
  "sys.files": [],
  "sys.user_id": "9347ec70-d7db-4943-9e69-b0bed251bc54",
  "sys.app_id": "a5cba457-16ce-4493-901c-3a78408cbef4",
  "sys.workflow_id": "5708888f-e53d-4000-bebd-af01b2fee66b",
  "sys.workflow_run_id": "2a9ffa7b-0a7c-4619-9885-161f90f0af7e"
}

4. Add LLM Node

Click “Add Node → LLM”, select the DeepSeek‑V3 model, and set the system prompt to “适当补充用户输入的文本,生成更加丰富的描述,文字长度不要太多”。

{
  "model_mode": "chat",
  "prompts": [{
    "role": "system",
    "text": "适当补充用户输入的文本,生成更加丰富的描述,文字长度不要太多。
这是用户输入的文本:小猫游泳",
    "files": []
  }],
  "model_provider": "langgenius/siliconflow/siliconflow",
  "model_name": "deepseek-ai/DeepSeek-V3"
}

5. Install Text‑to‑Video Plugin

Search the Marketplace for “Doubao Image and Video Generator” and install it. Provide the required API‑KEY from the Volcano Ark service.

6. Add Text‑to‑Video Node

Select the plugin’s “Text to Video” tool and map the LLM output as its prompt.

{
  "prompt": "一只毛茸茸的小黄猫在水池里欢快地划动着小爪子,溅起晶莹的水花。它眯着圆溜溜的眼睛,小耳朵不时抖落水珠,蓬松的尾巴像螺旋桨一样在水里摆动。阳光透过水面映照出斑驳的光影,小猫时而潜入水中追逐水泡,时而浮出水面发出奶声奶气的\"喵呜\"声,俨然是个快乐的小游泳健将。"
}
{
  "text": "正在使用豆包 API 生成视频...视频生成成功!视频链接: https://.../video.mp4",
  "files": [],
  "json": [{"type": "video", "url": "https://.../video.mp4"}]
}

7. Add Second LLM Node (LLM2)

Use LLM2 to format the video result into a user‑friendly message.

{
  "model_mode": "chat",
  "prompts": [{
    "role": "system",
    "text": "提示用户视频生成成功,并提供播放链接。",
    "files": []
  }],
  "model_provider": "langgenius/siliconflow/siliconflow",
  "model_name": "deepseek-ai/DeepSeek-V3"
}
{
  "result": "您的视频已成功生成!🎬
视频链接已准备好,您可以点击下方播放按钮直接观看:
[播放视频](https://.../video.mp4)
如果播放有问题,请确保网络稳定或复制链接到浏览器。链接将在2025年7月15日前有效。"
}

8. Add End Node

Connect the LLM2 output to the end node to finish the workflow.

Finalizing and Testing

Publish the workflow, then run it with a prompt such as “小猫游泳”. The workflow executes the steps above and returns a generated video.

Multiple videos can be generated; screenshots of the results are shown.

If you are unfamiliar with Dify workflows, refer to the documentation: https://docs.dify.ai/zh-hans/guides/workflow/readme

For more Dify articles, see the linked post about local deployment.

在这里插入图片描述
在这里插入图片描述
AILLMworkflowDifytext-to-videoModel configuration
Instant Consumer Technology Team
Written by

Instant Consumer Technology Team

Instant Consumer Technology Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.