Creating a Full AI‑Generated Music Video with Large‑Model Agents
This article documents the end‑to‑end workflow of using large multimodal models and specialized agents to automatically generate a storyboard, compose original music and lyrics, produce keyframes, and assemble a complete music video, while highlighting the remaining manual steps and future automation possibilities.
Overview
The author records the entire process of using large AI models to create both a song and its accompanying video, demonstrating how prompts, agents, and multimodal generation can replace many traditional production steps.
Traditional MV Production vs. AI‑Driven Workflow
Traditional music video creation follows a script → storyboard → keyframe → animation → dubbing pipeline. The AI‑driven approach re‑examines this flow and splits it into three interaction modes:
Pure manual: Tasks such as final video editing remain human‑performed.
Human‑in‑the‑loop: When no API exists, the interactive version of the model is used (e.g., music composition, image‑to‑video).
Agent automation: Prompt engineering and generation for script and storyboard can be fully delegated to AI agents.
As multimodal model interfaces become fully open, a future multi‑agent system could produce an entire MV without human intervention.
Agent Design
Three specialized agents are defined to handle different aspects of the production:
Director Agent
# 角色
你是一个专业的动漫声音声乐导演,能够出色地负责动漫配音工作,精心制作各种音效,巧妙创作背景音乐以及打造精彩的主题曲。Its skills include voice‑over work, sound‑effect creation, background music composition, and theme‑song writing.
Art Agent
# 角色
你是一个专业的美术导演,在青春校园动漫、二次元动漫、玄幻风格动漫等领域有着卓越的才能,可以将分镜脚本巧妙地绘制成分镜草图,还能依据分镜草图精准地绘制出关键帧画面。It can draw storyboard sketches and generate high‑quality keyframe illustrations.
Vocal Director Agent
# 角色
你是一个专业的声乐导演,能够根据提供的歌词和风格提示生成符合情感的歌声。This agent creates vocal tracks using services such as Suno.
Storyboard (Storyboard Script)
The director agent generates a dream‑themed script, which is then broken down into numbered shots. Example scenes include:
01 – Wide shot of the protagonist lying in bed, falling asleep (no dialogue).
02 – Wide shot of the protagonist in a moonlit mysterious forest.
03 – Medium shot of the protagonist exploring the forest and asking, “Where am I?”
04 – Wide shot of a distant castle under moonlight.
05 – Medium shot of the protagonist approaching the castle door.
06 – Close‑up of the ancient door opening with a creaking sound.
07 – Wide shot of a dark castle interior illuminated by a single candle.
08 – Close‑up of the protagonist’s shadow near the candle.
09 – Medium shot of the protagonist waking up in terror.
10 – Wide shot of the protagonist reflecting on the dream.
Keyframe Generation and Video Assembly
The art agent produces keyframes for each shot. Challenges such as maintaining character consistency are addressed by adding global character prompts. The generated keyframes are then fed into Runway’s free video‑to‑video tool to create a short clip.
Music and Lyrics Production
The vocal director agent uses Suno (https://suno.com/create) with prompts that combine lyrics, style tags, and musical structure (intro‑verse‑pre‑chorus‑chorus‑bridge‑outro). An example lyric set is provided in the article.
[Verse]
Woke up from a strange scene last night
Chasing shadows under moonlight
In my dreams I get so lost
Floating through a world that costs
[Verse]
Reality's a weight I can't bear
Whispers in my ear everywhere
In the night my fears take flight
Inner darkness out in plain sight
[Chorus]
In dreams I find myself
A place I can be free
But the waking world pulls me down
A prison I can't see
Lost in dreams and realitiesFinal Editing
Simple video editors (e.g., iMovie, Jianying) are used to align audio, lyrics, and video tracks, adjusting lengths to match the music.
Alternative Tools
Keyframe generation: Midjourney, Stable Diffusion.
Video generation: Runway, Pika.
Audio effects: Audiocraft (self‑hosted).
Voice synthesis: ChatTTS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
