Artificial Intelligence 8 min read

Getting Started with CodeVideoX API for Text‑to‑Video Generation Using Diffusion Transformers

This guide introduces CodeVideoX, a diffusion‑transformer based video generation model, explains its training and inference pipelines, and provides step‑by‑step instructions with API endpoints, required parameters, and example cURL commands for creating short AI‑generated videos.

Refining Core Development Skills

Aug 8, 2024

Getting Started with CodeVideoX API for Text‑to‑Video Generation Using Diffusion Transformers

Hello everyone, I'm Fei! While many still think large models only handle text, OpenAI's Sora demonstrated that they can also understand and generate complex video content.

Following Sora, Zhipu AI released CodeVideoX on July 26, offering the first open‑source video generation model with an API.

Source code: https://huggingface.co/spaces/THUDM/CogVideoX

CodeVideoX adopts the same Diffusion Transformer (DiT) architecture as Sora. Its training pipeline involves collecting large video datasets, compressing videos into lower‑dimensional representations, converting them to 1‑D sequences for the Transformer, and training a diffusion model.

Collect and annotate video data, then reduce dimensionality.

Compress videos spatially and temporally, producing low‑dimensional data for DiT fitting.

Flatten compressed data into a 1‑D sequence for Transformer processing, yielding a trained diffusion model.

During generation, the model interprets the user prompt, iteratively refines noise via the Transformer’s attention mechanism, and decodes the result back into video frames.

Zhipu AI also introduces an efficient 3D VAE that compresses videos to 2% of their original size and a 3D RoPE positional encoding to capture long‑range dependencies.

CodeVideoX provides a convenient API (no queue) and supports 6‑second videos at 1440×960 resolution and 16 fps.

To use the API, register on the Zhipu AI portal ( https://bigmodel.cn/ ), obtain an API key, and choose between HTTP requests or the official SDK (SDK recommended for production).

Key HTTP parameters:

Endpoint: https://open.bigmodel.cn/api/paas/v4/videos/generations

Authorization: Bearer <your‑API‑key>

Model: cogvideox

Prompt: your text description

image_url: optional image URL or base64 for image‑to‑video

Example cURL request to generate a video:

# curl --location 'https://open.bigmodel.cn/api/paas/v4/videos/generations' \
--header 'Authorization: Bearer {your‑API‑key}' \
--header 'Content-Type: application/json' \
--data '{
    "model": "cogvideox",
    "prompt": "人类的星际战舰已经开到了火星上，向着火星人发起了最后的总攻"
}'

After submission, you receive a task ID. Retrieve the result with:

# curl --location 'https://open.bigmodel.cn/api/paas/v4/async-result/{id}' \
--header 'Authorization: Bearer {your‑API‑key}'

The successful response contains a cover image URL and an MP4 video URL, e.g.:

{
    "model": "cogvideox",
    "request_id": "8893032770717091555",
    "task_status": "SUCCESS",
    "video_result": [
        {
            "cover_image_url": "https://sfile.chatglm.cn/testpath/video_cover/911fad1c-b99c-5dbc-9f8b-5da7c6b7e408_cover_0.png",
            "url": "https://sfile.chatglm.cn/testpath/video/911fad1c-b99c-5dbc-9f8b-5da7c6b7e408_0.mp4"
        }
    ]
}

You can directly open the video URL in a browser to view or download the generated clip.

CodeVideoX also supports image‑to‑video generation; providing an image URL or base64 yields animated results, as demonstrated with a personal avatar and various creative scenes.

While the model produces impressive videos of landscapes, animals, and characters, occasional artifacts (e.g., mismatched limbs) still occur, highlighting ongoing challenges.

Overall, AI‑generated video is rapidly advancing, and technologies like CodeVideoX are poised to drive significant societal and creative transformations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

API AIGC text-to-video CodeVideoX DiffusionTransformer

Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.