How Alibaba’s Pixelle-Video Generates Full Videos from a Single Sentence (22K Stars)

Pixelle-Video, an open‑source AI tool from Alibaba’s AIDC‑AI team, lets users type a single topic and automatically creates a complete short video—including script, images, voice‑over, background music and final MP4—through a fully automated pipeline that runs locally or in the cloud.

java1234
java1234
java1234
How Alibaba’s Pixelle-Video Generates Full Videos from a Single Sentence (22K Stars)

What a single sentence can do

Enter a topic and the system automatically generates a storyboard script, selects or creates matching images or video clips, synthesizes voice‑over, adds background music, and renders a final MP4 without manual editing.

Why the project gained rapid attention

Full automation – all stages from script writing to final rendering are handled by the AI pipeline.

Open‑source and locally runnable – Windows users can download a pre‑built bundle and run it without configuring a Python environment.

Flexible model combination – large language models such as Tongyi Qianwen, GPT, DeepSeek, or local Ollama can be used; image and video generation can be performed via ComfyUI workflows or APIs like DashScope, KeLing, Seedance.

Continuous feature iteration – recent additions include digital voice‑over, image‑to‑video, motion transfer, and custom asset upload.

Core capabilities

AI script writing – input a topic to obtain a storyboard script.

AI image / video generation – generate an image for each narration line or create dynamic video frames.

AI voice‑over – supports Edge‑TTS, Index‑TTS and voice cloning.

Background music – built‑in BGM library or user‑uploaded tracks.

Multiple templates – vertical, horizontal, static image or video templates can be selected.

Custom assets – upload personal photos or videos; the AI analyzes them to generate scripts.

Digital voice‑over – upload a reference image to generate a talking‑head video.

Image‑to‑video / motion transfer – advanced modes for creative users.

Automatic video creation pipeline

Script generation – a large model writes a storyboard based on the input topic.

Image planning – each narration line is matched with a suitable visual.

Frame‑by‑frame processing – images or video clips are generated and voice‑over is synthesized.

Video composition – a template, background music, and subtitles are applied to produce the final MP4.

Each stage can swap models or styles, so the same topic can yield vastly different visual results by changing the template.

Getting started

Windows (quickest)

Download the bundled package from the Releases page.

Extract the archive and double‑click start.bat.

Open a browser at http://localhost:8501.

Enter LLM and image‑service API keys in the system configuration.

Type a topic and click “Generate”.

Mac / Linux

git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video
uv run streamlit run web/app.py

The first run requires API‑key configuration. The project supports a completely free local setup (e.g., Ollama + ComfyUI) and low‑cost cloud options such as Tongyi Qianwen.

Project repository

https://github.com/AIDC-AI/Pixelle-Video

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaLLMopen-sourceAI video generationComfyUIStreamlit
java1234
Written by

java1234

Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.