One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

The article introduces three open‑source AI tools—a video editor that turns raw footage and a brief into a finished ad, Baidu's 8‑billion‑parameter text‑to‑image model that runs on 24 GB GPUs, and a weekly AI‑developer digest that auto‑generates Chinese reports—detailing their workflows, benchmarks, usage commands, and target users.

Geek Labs
Geek Labs
Geek Labs
One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

01 | agentic-video-editor

Manual editing of raw footage into a 30‑second advertisement typically requires half a day of selecting shots, arranging rhythm, exporting, and reviewing.

The tool replaces this workflow with an AI Agent pipeline consisting of:

Original footage + creative brief
    ↓
[Pre‑processing] – scene detection, speech‑to‑text, shot indexing
    ↓
[Director Agent] – AI searches footage, selects shots, creates an edit plan
    ↓
[Refinement Agent] – fine‑tunes start/end of each shot
    ↓
[Edit Agent] – FFmpeg renders MP4
    ↓
[Review Agent] – scores relevance, rhythm, visual quality, viewing experience, overall (0‑1 each)
    ↓
If overall score < threshold → feedback to Director Agent (max 3 retries)

Running the editor requires a single command:

ave edit \
  --footage-dir /path/to/your/footage \
  --brief '{"product": "My Product", "audience": "Women 25-45", "tone": "authentic", "duration_seconds": 30}' \
  --pipeline pipelines/ugc-ad.yaml \
  --style styles/dtc-testimonial.yaml

The built‑in DTC template follows the hook → problem → solution → social proof → CTA structure; custom YAML pipelines can be authored to combine agents differently.

02 | ERNIE‑Image

ERNIE‑Image is Baidu’s open‑source diffusion‑transformer (DiT) model with 8 B parameters, achieving state‑of‑the‑art results among open‑weight text‑to‑image models.

GenEval benchmark scores:

Overall 0.8856 (higher than Qwen‑Image 0.8683 and FLUX.2‑klein‑9B 0.8481)

LongTextBench (Chinese long‑text) 0.9733, comparable to Seedream 4.5 0.9882

Key strengths identified in the source:

Text rendering – long paragraphs, dense typography, layout‑rich images (posters, infographics, UI mockups)

Complex instruction compliance – accurate handling of multi‑object, relational, knowledge‑intensive prompts

Structured generation – posters, comics, storyboards, multi‑panel graphics

Consumer‑grade deployment – runs on a single GPU with 24 GB VRAM

Two released variants:

ERNIE‑Image (SFT version) – 50 inference steps, guidance scale 4.0

ERNIE‑Image‑Turbo (DMD+RL accelerated) – 8 inference steps, guidance scale 1.0

Example usage via HuggingFace:

import torch
from diffusers import ErnieImagePipeline

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="a black‑and‑white Chinese countryside dog",
    height=1024, width=1024,
    num_inference_steps=50,
    guidance_scale=4.0,
    use_pe=True,
).images[0]

03 | ai-influence-digest

The tool monitors public activity of more than 65 AI developers, filters posts that are immediately useful for content creators, and generates a structured Chinese weekly briefing without relying on the X (Twitter) API.

Core features:

No X API dependency – fully compliant and avoids account bans

Coverage of tools, workflows, tutorials, prompts across 65+ developers

Automatic rendering of Xiaohongshu‑style long‑image screenshots for easy sharing

Markdown‑formatted Chinese summary output

Three‑step workflow:

# Step 1: Scan candidate posts
python3 scripts/scan_x_weekly.py \
  --accounts references/accounts_65.txt \
  --days 7 \
  --outdir ./output/ai-influence-digest

# Step 2: Human review and assemble Markdown weekly report
# (filter criteria in references/filters.md)

# Step 3: Render Xiaohongshu‑style report screenshot
bash scripts/render_weekly_screenshots.sh \
  ./output/ai-influence-digest/weekly_report.md \
  ./output/ai-influence-digest/weekly_report.png \
  "2026-04-18"

Summary

agentic-video-editor – automates raw footage editing into ads via an AI Agent pipeline with automatic review and up to three retry cycles.

ERNIE‑Image – 8 B diffusion‑transformer delivering state‑of‑the‑art text‑to‑image generation on a single 24 GB GPU; excels at Chinese text rendering and structured graphics.

ai-influence-digest – continuously tracks 65+ AI developers, filters high‑value updates, and produces a ready‑to‑share Chinese weekly briefing.

All projects are open source. Repository URLs: https://github.com/poseljacob/agentic-video-editor, https://github.com/baidu/ERNIE-Image, https://github.com/koffuxu/ai-influence-digest.

text-to-imageopen-sourceAI content creationagentic workflowAI video editing
Geek Labs
Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.