Artificial Intelligence 28 min read

How to Use Harness and AI Agents to Auto‑Generate Knowledge‑Explainer Videos

This tutorial walks through building a controllable video‑generation pipeline that lets an AI Agent turn an article into a scripted, outlined, and visually animated web‑based presentation, covering skill design, execution phases, parallel development, self‑checks, and audio synthesis.

Sohu Tech Products

May 27, 2026

How to Use Harness and AI Agents to Auto‑Generate Knowledge‑Explainer Videos

Introduction

The author explains a workflow that lets an AI Agent automatically create a knowledge‑explainer video from a raw article, emphasizing controllability over using black‑box video‑generation models.

Why Build the Video with a Web Page

The video is rendered as a web page created with Vibe Coding. Controllable elements such as fonts, colors, step durations, and precise on‑screen numbers are edited by changing a few lines of code, making the result more stable and cheaper than using video‑generation models.

Example of a Previous Video

The input was an Anthropic blog post (https://claude.com/blog/lessons-from-building-claude-code-prompt-caching-is-everything). The output web page split the article into 13 chapters and over 100 fine‑grained steps , each with a visual demonstration. The canvas is fixed at 16:9, a hidden progress bar appears on hover, and there are no headers, page numbers, or brand marks, giving a clean video‑like appearance.

Key Workflow

Convert article to script : Transform formal technical sentences into conversational, second‑person narration suitable for speaking.

Split script into development outline : Each sentence maps to a visual step; groups of steps form chapters, each focusing on a single topic.

Create visual demos : Draw sequence diagrams for TCP handshakes, node links for DNS queries, or score gauges for anti‑spam judgments instead of merely overlaying text.

Align steps with narration : The web page’s animation timeline follows the script so the final video feels natural.

Why Wrap the Process in a Skill

Although the model can perform the tasks, the challenge is to make it repeatable for any article, any user, and any theme without relying on luck. A Skill defines a set of rules—boundaries, state management, checkpoints, and error handling—that the Agent must obey.

Core Parts of the Skill

The Skill consists of six components:

Context Management : Determines what the model sees at each phase.

Tool System : Exposes file‑read/write, API calls, code execution, and browser automation.

Execution Orchestration : Decides the next action for the model.

State & Memory : Persists task boundaries, outline, and script across steps.

Evaluation & Observation : Self‑checks and quality metrics.

Constraints & Recovery : Handles errors and prevents drift.

Execution & Orchestration

The Skill splits the end‑to‑end flow into four phases with two manual checkpoints:

Phase 1 – Content Writing : The Agent produces both the narration script and the development outline.

Checkpoint Plan : Human reviews five items—script, outline, theme, assets, and development mode—before proceeding.

Phase 2 – Development : The first chapter is reviewed and becomes the baseline; subsequent chapters can be built sequentially or in parallel, each passing a checklist.

Checkpoint Audio : Decide whether to synthesize audio or record voice manually.

Phase 3 – Audio Synthesis : Extract TTS instructions from the script and generate per‑step audio files.

Phase 4 – Screen Recording : Open the web page in full‑screen auto‑play mode, record, and trim the video.

The focus is on keeping the Agent from deviating at any stage.

Context Management Details

All information is split into separate markdown files, each loaded only in its designated phase to avoid attention dilution. The outline.md (development plan) serves as the long‑term memory, recording chapter boundaries, visual decisions, and user feedback.

Tool System Details

No special external tools are required; the Agent uses its built‑in file‑read/write capabilities. Parallel development is supported by isolating each chapter in its own folder (e.g., src/chapters/01-intro/) and using unique CSS prefixes to avoid class‑name clashes. Theme tokens ensure visual consistency across parallel agents.

Constraints & Recovery

When feedback indicates a problem (e.g., “the rhythm is too fast”), the Skill applies a minimal‑slice fix instead of re‑generating the entire chapter. The table of problem layers and corresponding fixes is:

Rhythm – adjust narration length or split/merge steps.

Visual – modify only the affected CSS or animation logic.

Content – edit the article.md snippet for the specific step.

Code – locate the file and line number and patch directly.

Evaluation & Observation

Each key output has a hard self‑check list:

script.md : Conversational tone, B‑site style, ≥60% information retention, natural to read aloud.

outline.md : Step count, duration, information pool, no animation details, script‑outline timing error <10%.

CHAPTER‑CRAFT.md : Animated, non‑template visuals, large fonts, step‑by‑step reveal, real assets, isolated code.

Three execution strategies are offered:

Optimal : Use Agent Teams (currently only Claude Code supports) to create a dedicated Reviewer Agent that runs a full checklist.

Sub‑Agent : When Agent Teams is unavailable, run the same flow with a Sub‑Agent.

Fallback : The main Agent performs a strict self‑check without visual inspection.

Environment Setup

The required tools are installed via the following commands (style attributes removed):

curl -fsSL https://claude.ai/install.sh | bash

claude -v

MiniMax is used for token‑plan and TTS. After obtaining an API key, CC Switch configures the model:

帮我安装 MiniMax CLI：https://github.com/MiniMax-AI/cli
我的密钥是 sk‑cp‑xxxxx

The Skill is cloned from the GitHub repository https://github.com/ConardLi/garden-skills and placed under .claude/skills . After installation, typing /web-video in Claude Code should suggest the Skill.

Practical Walk‑through

Phase 1 – Generate Script and Outline

Feed the original article to Claude Code with the web‑video‑presentation Skill. The Agent creates a Content Writer sub‑agent that produces script.md and outline.md , then two Reviewer agents perform quality checks.

Human Confirmation

The user decides whether to edit the script or outline and selects a visual theme from options such as paper‑press , warm‑keynote , midnight‑press , blueprint , etc.

Phase 2 – First Chapter Development

After confirming the script, outline, theme, and development mode, the Agent builds the first chapter. The first chapter serves as a benchmark; the user reviews it in the browser to verify layout, pacing, and visual quality before proceeding.

Parallel Development of Remaining Chapters

Once the first chapter passes, the remaining chapters are developed in parallel using Agent Teams or Sub‑Agents (up to three concurrent agents is recommended). Each agent receives its chapter outline, theme tokens, and the first chapter as reference.

Phase 3 – Audio Synthesis

After all chapters are approved, the MiniMax CLI synthesizes Mandarin “Gentleman” voice audio for each step. The process extracts a narration list, the user verifies it, then runs batch TTS commands. Generated audio files are stored alongside their corresponding steps.

Phase 4 – Playback Modes and Recording

The final web page supports three modes:

Manual (no audio): Click through steps, suitable for manual voice‑over.

Audio (add ?audio=1): Click or press keys to advance while audio plays.

Auto (add ?auto=1): Press space once to bypass autoplay restrictions, then the page plays automatically according to the audio timeline. Recording with OBS yields the final video.

Archiving

All assets—original article, script, outline, web code, and audio files—are version‑controlled, enabling reuse for new topics by simply swapping the source article.

Conclusion

The whole pipeline demonstrates a Harness practice: orchestrating powerful AI models (Claude Code, MiniMax), built‑in tooling, and human checkpoints into a stable, repeatable production system for knowledge‑explainer videos.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Automation AI agents video generation MiniMax Claude Code Skill Agent Teams Harness

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Why Build the Video with a Web Page

Example of a Previous Video

Key Workflow

Why Wrap the Process in a Skill

Core Parts of the Skill

Execution & Orchestration

Context Management Details

Tool System Details

Constraints & Recovery

Evaluation & Observation

Environment Setup

Practical Walk‑through

Phase 1 – Generate Script and Outline

Human Confirmation

Phase 2 – First Chapter Development

Parallel Development of Remaining Chapters

Phase 3 – Audio Synthesis

Phase 4 – Playback Modes and Recording

Archiving

Conclusion

Sohu Tech Products

How this landed with the community

Was this worth your time?

0 Comments

Phase 1 – Generate Script and Outline

Phase 2 – First Chapter Development

Phase 3 – Audio Synthesis

Phase 4 – Playback Modes and Recording