How to Use Harness and AI Agents to Auto‑Generate Knowledge‑Explainer Videos
This tutorial walks through building a controllable video‑generation pipeline that lets an AI Agent turn an article into a scripted, outlined, and visually animated web‑based presentation, covering skill design, execution phases, parallel development, self‑checks, and audio synthesis.
Introduction
The author explains a workflow that lets an AI Agent automatically create a knowledge‑explainer video from a raw article, emphasizing controllability over using black‑box video‑generation models.
Why Build the Video with a Web Page
The video is rendered as a web page created with Vibe Coding. Controllable elements such as fonts, colors, step durations, and precise on‑screen numbers are edited by changing a few lines of code, making the result more stable and cheaper than using video‑generation models.
Example of a Previous Video
The input was an Anthropic blog post (https://claude.com/blog/lessons-from-building-claude-code-prompt-caching-is-everything). The output web page split the article into 13 chapters and over 100 fine‑grained steps , each with a visual demonstration. The canvas is fixed at 16:9, a hidden progress bar appears on hover, and there are no headers, page numbers, or brand marks, giving a clean video‑like appearance.
Key Workflow
Convert article to script : Transform formal technical sentences into conversational, second‑person narration suitable for speaking.
Split script into development outline : Each sentence maps to a visual step; groups of steps form chapters, each focusing on a single topic.
Create visual demos : Draw sequence diagrams for TCP handshakes, node links for DNS queries, or score gauges for anti‑spam judgments instead of merely overlaying text.
Align steps with narration : The web page’s animation timeline follows the script so the final video feels natural.
Why Wrap the Process in a Skill
Although the model can perform the tasks, the challenge is to make it repeatable for any article, any user, and any theme without relying on luck. A Skill defines a set of rules—boundaries, state management, checkpoints, and error handling—that the Agent must obey.
Core Parts of the Skill
The Skill consists of six components:
Context Management : Determines what the model sees at each phase.
Tool System : Exposes file‑read/write, API calls, code execution, and browser automation.
Execution Orchestration : Decides the next action for the model.
State & Memory : Persists task boundaries, outline, and script across steps.
Evaluation & Observation : Self‑checks and quality metrics.
Constraints & Recovery : Handles errors and prevents drift.
Execution & Orchestration
The Skill splits the end‑to‑end flow into four phases with two manual checkpoints:
Phase 1 – Content Writing : The Agent produces both the narration script and the development outline.
Checkpoint Plan : Human reviews five items—script, outline, theme, assets, and development mode—before proceeding.
Phase 2 – Development : The first chapter is reviewed and becomes the baseline; subsequent chapters can be built sequentially or in parallel, each passing a checklist.
Checkpoint Audio : Decide whether to synthesize audio or record voice manually.
Phase 3 – Audio Synthesis : Extract TTS instructions from the script and generate per‑step audio files.
Phase 4 – Screen Recording : Open the web page in full‑screen auto‑play mode, record, and trim the video.
The focus is on keeping the Agent from deviating at any stage.
Context Management Details
All information is split into separate markdown files, each loaded only in its designated phase to avoid attention dilution. The outline.md (development plan) serves as the long‑term memory, recording chapter boundaries, visual decisions, and user feedback.
Tool System Details
No special external tools are required; the Agent uses its built‑in file‑read/write capabilities. Parallel development is supported by isolating each chapter in its own folder (e.g., src/chapters/01-intro/) and using unique CSS prefixes to avoid class‑name clashes. Theme tokens ensure visual consistency across parallel agents.
Constraints & Recovery
When feedback indicates a problem (e.g., “the rhythm is too fast”), the Skill applies a minimal‑slice fix instead of re‑generating the entire chapter. The table of problem layers and corresponding fixes is:
Rhythm – adjust narration length or split/merge steps.
Visual – modify only the affected CSS or animation logic.
Content – edit the article.md snippet for the specific step.
Code – locate the file and line number and patch directly.
Evaluation & Observation
Each key output has a hard self‑check list:
script.md : Conversational tone, B‑site style, ≥60% information retention, natural to read aloud.
outline.md : Step count, duration, information pool, no animation details, script‑outline timing error <10%.
CHAPTER‑CRAFT.md : Animated, non‑template visuals, large fonts, step‑by‑step reveal, real assets, isolated code.
Three execution strategies are offered:
Optimal : Use Agent Teams (currently only Claude Code supports) to create a dedicated Reviewer Agent that runs a full checklist.
Sub‑Agent : When Agent Teams is unavailable, run the same flow with a Sub‑Agent.
Fallback : The main Agent performs a strict self‑check without visual inspection.
Environment Setup
The required tools are installed via the following commands (style attributes removed):
curl -fsSL https://claude.ai/install.sh | bash claude -vMiniMax is used for token‑plan and TTS. After obtaining an API key, CC Switch configures the model:
帮我安装 MiniMax CLI:https://github.com/MiniMax-AI/cli
我的密钥是 sk‑cp‑xxxxxThe Skill is cloned from the GitHub repository https://github.com/ConardLi/garden-skills and placed under .claude/skills . After installation, typing /web-video in Claude Code should suggest the Skill.
Practical Walk‑through
Phase 1 – Generate Script and Outline
Feed the original article to Claude Code with the web‑video‑presentation Skill. The Agent creates a Content Writer sub‑agent that produces script.md and outline.md , then two Reviewer agents perform quality checks.
Human Confirmation
The user decides whether to edit the script or outline and selects a visual theme from options such as paper‑press , warm‑keynote , midnight‑press , blueprint , etc.
Phase 2 – First Chapter Development
After confirming the script, outline, theme, and development mode, the Agent builds the first chapter. The first chapter serves as a benchmark; the user reviews it in the browser to verify layout, pacing, and visual quality before proceeding.
Parallel Development of Remaining Chapters
Once the first chapter passes, the remaining chapters are developed in parallel using Agent Teams or Sub‑Agents (up to three concurrent agents is recommended). Each agent receives its chapter outline, theme tokens, and the first chapter as reference.
Phase 3 – Audio Synthesis
After all chapters are approved, the MiniMax CLI synthesizes Mandarin “Gentleman” voice audio for each step. The process extracts a narration list, the user verifies it, then runs batch TTS commands. Generated audio files are stored alongside their corresponding steps.
Phase 4 – Playback Modes and Recording
The final web page supports three modes:
Manual (no audio): Click through steps, suitable for manual voice‑over.
Audio (add ?audio=1): Click or press keys to advance while audio plays.
Auto (add ?auto=1): Press space once to bypass autoplay restrictions, then the page plays automatically according to the audio timeline. Recording with OBS yields the final video.
Archiving
All assets—original article, script, outline, web code, and audio files—are version‑controlled, enabling reuse for new topics by simply swapping the source article.
Conclusion
The whole pipeline demonstrates a Harness practice: orchestrating powerful AI models (Claude Code, MiniMax), built‑in tooling, and human checkpoints into a stable, repeatable production system for knowledge‑explainer videos.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
