How Alibaba’s Pixelle-Video Generates Full Videos from a Single Sentence (22K Stars)
Pixelle-Video, an open‑source AI tool from Alibaba’s AIDC‑AI team, lets users type a single topic and automatically creates a complete short video—including script, images, voice‑over, background music and final MP4—through a fully automated pipeline that runs locally or in the cloud.
What a single sentence can do
Enter a topic and the system automatically generates a storyboard script, selects or creates matching images or video clips, synthesizes voice‑over, adds background music, and renders a final MP4 without manual editing.
Why the project gained rapid attention
Full automation – all stages from script writing to final rendering are handled by the AI pipeline.
Open‑source and locally runnable – Windows users can download a pre‑built bundle and run it without configuring a Python environment.
Flexible model combination – large language models such as Tongyi Qianwen, GPT, DeepSeek, or local Ollama can be used; image and video generation can be performed via ComfyUI workflows or APIs like DashScope, KeLing, Seedance.
Continuous feature iteration – recent additions include digital voice‑over, image‑to‑video, motion transfer, and custom asset upload.
Core capabilities
AI script writing – input a topic to obtain a storyboard script.
AI image / video generation – generate an image for each narration line or create dynamic video frames.
AI voice‑over – supports Edge‑TTS, Index‑TTS and voice cloning.
Background music – built‑in BGM library or user‑uploaded tracks.
Multiple templates – vertical, horizontal, static image or video templates can be selected.
Custom assets – upload personal photos or videos; the AI analyzes them to generate scripts.
Digital voice‑over – upload a reference image to generate a talking‑head video.
Image‑to‑video / motion transfer – advanced modes for creative users.
Automatic video creation pipeline
Script generation – a large model writes a storyboard based on the input topic.
Image planning – each narration line is matched with a suitable visual.
Frame‑by‑frame processing – images or video clips are generated and voice‑over is synthesized.
Video composition – a template, background music, and subtitles are applied to produce the final MP4.
Each stage can swap models or styles, so the same topic can yield vastly different visual results by changing the template.
Getting started
Windows (quickest)
Download the bundled package from the Releases page.
Extract the archive and double‑click start.bat.
Open a browser at http://localhost:8501.
Enter LLM and image‑service API keys in the system configuration.
Type a topic and click “Generate”.
Mac / Linux
git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video
uv run streamlit run web/app.pyThe first run requires API‑key configuration. The project supports a completely free local setup (e.g., Ollama + ComfyUI) and low‑cost cloud options such as Tongyi Qianwen.
Project repository
https://github.com/AIDC-AI/Pixelle-Video
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
java1234
Former senior programmer at a Fortune Global 500 company, dedicated to sharing Java expertise. Visit Feng's site: Java Knowledge Sharing, www.java1234.com
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
