How to Build an AI Comic‑Generating Agent with LangGraphGo and Skills
This article walks through constructing a multi‑step AI comic‑generation agent using the LangGraphGo framework and the GoSkills plugin system, covering architecture design, declarative tool definitions, automatic configuration discovery, parameter conversion, code implementation, common pitfalls, best practices, and performance optimizations.
Introduction
In AI application development, agent architectures are becoming increasingly important because they can plan execution steps, invoke external tools, and dynamically adjust strategies based on results. This guide demonstrates how to build a comic‑generation agent that creates storyboard scripts, generates images for each page, and merges them into a PDF.
Technology Stack Overview
The core components are:
LangGraphGo : a Go implementation of the LangGraph framework providing state‑graph capabilities.
GoSkills v0.6.1+ : a plugin system that wraps scripts as LLM‑callable tools.
TypeScript scripts : executed with npx tsx for storyboard and image generation.
ERNIE 5.0 Thinking Preview : Baidu's large language model used for planning and tool orchestration.
Project Architecture
The repository layout is:
comic_skill_example/
├── main.go # Entry point, creates and runs the Agent
├── go.mod # Go module dependencies
└── skills/ # Directory of skill packages
├── baoyu-comic/ # Storyboard generation skill
│ ├── SKILL.md
│ └── scripts/
│ ├── generate-comic.ts
│ └── merge-to-pdf.ts
├── baoyu-image-gen/ # Image generation skill
│ ├── SKILL.md
│ └── scripts/
│ └── main.ts
└── pdf/ # PDF processing skill (Python)
├── SKILL.md
└── scripts/
├── check_bounding_boxes.py
├── convert_pdf_to_images.py
└── ...The system automatically loads every skill package under skills/, but the comic agent only uses three core tools: generate_comic_storyboard, generate_comic_image, and merge_comic_to_pdf, which come from the baoyu-comic and baoyu-image-gen packages.
Skill Definition System (SKILL.md)
Each skill is described in a SKILL.md file using YAML front‑matter. The file declares the tool name, script path, description, and parameters. For example, the storyboard skill defines a topic string (required), an optional style, pages integer, and aspect string.
---
name: baoyu-comic
description: Knowledge comic creator supporting multiple styles
tools:
- name: generate_comic_storyboard
script: scripts/generate-comic.ts
description: Create a complete storyboard and prompts
parameters:
topic:
type: string
description: Comic theme
required: true
style:
type: string
description: Visual style (e.g., warm, classic)
required: false
pages:
type: integer
description: Number of pages
required: false
aspect:
type: string
description: Aspect ratio (e.g., 3:4, 16:9)
required: false
---Design Highlights
Declarative tool definition – all tool metadata lives in SKILL.md instead of hard‑coded Go.
Automatic OpenAPI schema generation – the system builds JSON schemas from the parameter definitions.
Zero Go code changes – adding a new tool only requires editing SKILL.md and adding the script.
Core Implementation Details
The adapter adapter/goskills/goskills.go reads SKILL.md and builds a ToolConfig structure. The previous hard‑coded approach required over 60 lines of Go; the new auto‑discovery approach needs zero lines of manual configuration.
// buildToolConfigFromSkill reads a SKILL.md file and constructs a ToolConfig
func buildToolConfigFromSkill(skill *goskills.SkillPackage) *ToolConfig {
if len(skill.Meta.Tools) == 0 {
return nil
}
config := &ToolConfig{
NameMapping: make(map[string]string),
DescriptionOverrides: make(map[string]string),
SchemaOverrides: make(map[string]map[string]any),
}
for _, toolDef := range skill.Meta.Tools {
// name mapping
config.NameMapping[toolDef.Name] = toolDef.Name
// description
if toolDef.Description != "" {
config.DescriptionOverrides[toolDef.Name] = toolDef.Description
}
// schema generation
schema := map[string]any{"type": "object", "properties": make(map[string]any)}
var required []string
for paramName, param := range toolDef.Parameters {
prop := map[string]any{"type": param.Type}
if param.Description != "" {
prop["description"] = param.Description
}
schema["properties"].(map[string]any)[paramName] = prop
if param.Required {
required = append(required, paramName)
}
}
if len(required) > 0 {
schema["required"] = required
}
schema["additionalProperties"] = false
config.SchemaOverrides[toolDef.Name] = schema
}
return config
}
// SkillsToTools converts a SkillPackage into a slice of tools, using the auto‑generated config.
func SkillsToTools(skill *goskills.SkillPackage, opts ...SkillsToToolsOptions) ([]tools.Tool, error) {
var config *ToolConfig
// 1. Try auto‑building from SKILL.md
skillConfig := buildToolConfigFromSkill(skill)
if skillConfig != nil {
config = skillConfig
}
// 2. Merge user‑provided overrides if any
if len(opts) > 0 && opts[0].ToolConfig != nil {
// merge logic …
}
// 3. Generate tools …
return nil, nil
}Advantages comparison
Code size: hard‑coded approach ~60 lines vs. auto‑discovery 0 lines.
Maintenance cost: high (duplicate edits) vs. low (single source).
Extensibility: requires recompilation vs. no Go changes.
Type safety: compile‑time checks vs. runtime checks.
Parameter Conversion (JSON → CLI)
LLM output is JSON, but the TypeScript scripts expect command‑line flags. The adapter converts named parameters to flags according to a mapping table.
{
"topic": "采蘑菇的小姑娘",
"style": "warm",
"pages": 1
} npx tsx generate-comic.ts --topic "采蘑菇的小姑娘" --style "warm" --pages 1 func (t *SkillTool) Call(ctx context.Context, input string) (string, error) {
// Parse JSON
var namedParams map[string]any
err := json.Unmarshal([]byte(input), &namedParams)
if err == nil && len(namedParams) > 0 {
// Convert to CLI args
paramMapping := map[string]string{
"topic": "--topic",
"style": "--style",
"pages": "--pages",
"aspect": "--aspect",
"path": "--image",
"prompt": "--prompt",
"ar": "--ar",
"quality": "--quality",
"directory": "--directory",
}
paramOrder := []string{"topic", "style", "pages", "aspect", "path", "prompt", "ar", "quality", "directory"}
var args []string
for _, key := range paramOrder {
if value, ok := namedParams[key]; ok && value != nil {
if flag, ok := paramMapping[key]; ok {
args = append(args, flag, fmt.Sprintf("%v", value))
}
}
}
// Execute script based on extension
if strings.HasSuffix(scriptPath, ".py") {
return goskills.RunPythonScript(scriptPath, args)
} else if strings.HasSuffix(scriptPath, ".ts") || strings.HasSuffix(scriptPath, ".js") {
return langgraphtool.RunTypeScriptScript(scriptPath, args)
}
return langgraphtool.RunShellScript(scriptPath, args)
}
// Fallback to old args format …
return "", fmt.Errorf("unsupported input format")
}TypeScript Execution Layer
// RunTypeScriptScript executes a .ts or .js file via npx tsx without compilation.
func RunTypeScriptScript(scriptPath string, args []string) (string, error) {
cmdArgs := append([]string{"tsx", scriptPath}, args...)
cmd := exec.Command("npx", cmdArgs...)
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return "", fmt.Errorf("failed to run typescript script: %w
Stdout: %s
Stderr: %s", err, stdout.String(), stderr.String())
}
return stdout.String() + stderr.String(), nil
}Why Choose tsx ?
✅ No pre‑compilation, fast development.
✅ Supports TypeScript and ESM.
✅ Fully compatible with the Node.js ecosystem.
✅ Allows use of the latest JavaScript syntax.
Supported Script Types
TypeScript (.ts) : business logic and image generation – executed with npx tsx script.ts.
JavaScript (.js) : simple scripts – executed with npx tsx script.js.
Python (.py) : data processing and PDF handling – executed with python script.py.
Shell (.sh) : system operations – executed with bash script.sh.
Complete Workflow
The agent receives a user request, plans the steps, calls the storyboard tool, then the image generation tool for each page, and finally merges the images into a PDF.
User Input
│
▼
┌───────────────────────────────────────┐
│ Agent Node: LLM planning + tool calls │
│ Input: user request + tool definitions│
│ Output: structured tool calls │
└─────────────────────┬─────────────────┘
│
▼
┌───────────────┐
│ Tools Node │
└───────────────┘
│
┌───────────┴───────────┐
▼ ▼
Storyboard generation Image generation
(generate-comic.ts) (main.ts)
│ │
▼ ▼
Storyboard JSON Comic images
│ │
└───────────┬───────────┘
▼
PDF merge script
(merge-to-pdf.ts)
│
▼
Complete PDF comicMain Program Code (main.go)
// main.go
package main
import (
"context"
"fmt"
"log"
"os"
"strings"
"github.com/smallnest/goskills"
adapter "github.com/smallnest/langgraphgo/adapter/goskills"
"github.com/smallnest/langgraphgo/prebuilt"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/openai"
"github.com/tmc/langchaingo/tools"
)
func main() {
// 1. Initialize LLM (recommend ERNIE 5.0 Thinking Preview)
// Set environment variables: OPENAI_API_KEY and OPENAI_BASE_URL accordingly.
llm, err := openai.New()
if err != nil { log.Fatal(err) }
// 2. Load skill packages from the skills directory
skillsDir := "./skills"
if _, err := os.Stat(skillsDir); os.IsNotExist(err) {
skillsDir = "comic_skill_example/skills"
}
packages, err := goskills.ParseSkillPackages(skillsDir)
if err != nil { log.Fatalf("Failed to parse skill packages: %v", err) }
if len(packages) == 0 { log.Fatal("No skills found in " + skillsDir) }
// 3. Convert skills to tools (auto‑read SKILL.md)
var allTools []tools.Tool
for _, skill := range packages {
fmt.Printf("Loading skill: %s - %s
", skill.Meta.Name, skill.Meta.Description)
skillTools, err := adapter.SkillsToTools(skill)
if err != nil { log.Printf("Failed to convert skill %s: %v", skill.Meta.Name, err); continue }
allTools = append(allTools, skillTools...)
}
// 4. Filter comic‑related tools
var comicTools []tools.Tool
for _, t := range allTools {
if t.Name() == "generate_comic_storyboard" || t.Name() == "generate_comic_image" || t.Name() == "merge_comic_to_pdf" {
comicTools = append(comicTools, t)
}
}
// 5. System prompt describing the agent's responsibilities
systemMsg := `You are a helpful assistant that can call tools to create comics. When a user asks for a comic, you must call generate_comic_storyboard first.
Available functions:
- generate_comic_storyboard: create a full storyboard and prompts
- generate_comic_image: generate a single comic page (needs prompt and path)
- merge_comic_to_pdf: merge all pages into a PDF
Workflow:
1. Call generate_comic_storyboard
2. If the output contains '=== IMAGE_GENERATION_REQUIRED ===', call generate_comic_image for each page
3. Call merge_comic_to_pdf to produce the final PDF
Always call the function; never return a textual description.`
// 6. Create the agent with a 20‑step limit
agent, err := prebuilt.CreateAgentMap(llm, comicTools, 20, prebuilt.WithSystemMessage(systemMsg))
if err != nil { log.Fatal(err) }
// 7. Invoke the agent with the user request passed as a command‑line argument
ctx := context.Background()
resp, err := agent.Invoke(ctx, map[string]any{"messages": []llms.MessageContent{llms.TextParts(llms.ChatMessageTypeHuman, os.Args[1])}})
if err != nil { log.Fatal(err) }
// 8. Print the resulting messages
if messages, ok := resp["messages"].([]llms.MessageContent); ok {
for _, msg := range messages {
fmt.Printf("[%s] %s
", msg.Role, msg.Parts)
}
}
}Pitfalls and Solutions
DeepSeek V3 tool‑call instability
Problem: DeepSeek V3 sometimes returns malformed tool‑call markers, causing parsing failures.
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>generate_comic_storyboard{...}<|tool▁call▁end|>Solution: Switch to ERNIE 5.0 Thinking Preview, which provides stable tool‑call formatting.
llm, err := openai.New(
openai.WithToken("your-ernie-api-key"),
openai.WithBaseURL("https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie-5.0-thinking-preview"),
)TypeScript script execution issues
Problem: Bun‑specific APIs (e.g., Bun.write) are not compatible with Node.js.
Remove Bun‑only calls.
Replace with Node.js APIs such as fs.writeFileSync.
Execute scripts with npx tsx instead of bun run.
Parameter format conversion
Problem: LLM returns JSON while scripts expect CLI flags.
Solution: Implement automatic conversion in the tool execution layer (see the code in the “Parameter Conversion” section).
Chinese filename support in PDF merging
Problem: The regular expression used to match page filenames does not include Unicode Chinese characters.
// Add Unicode Chinese range
const pagePattern = /^(=+)-(cover|page)(-[\一-鿿-]+)?.(png|jpg|jpeg)$/i;Best Practices Summary
Single Responsibility : each skill handles one concern (storyboard, image generation, PDF merging).
Declarative Configuration : define tools in SKILL.md to avoid hard‑coding.
Language Choice : use TypeScript for image‑related logic and Python for PDF handling to leverage ecosystem strengths.
Performance Optimizations
Concurrent Image Generation
// Generate all pages concurrently with a semaphore limiting concurrency to 3
var wg sync.WaitGroup
semaphore := make(chan struct{}, 3)
for _, page := range pages {
wg.Add(1)
go func(p Page) {
defer wg.Done()
semaphore <- struct{}{}
defer func() { <-semaphore }()
generateImage(p)
}(page)
}
wg.Wait()Cache Mechanism for Repeated Requests
type CacheKey struct {
Topic string
Style string
Pages int
}
var storyboardCache sync.Map // map[CacheKey]StoryboardResultFuture Directions
Multimodal Input : support images or video as source material.
Style Transfer : one‑click switching of comic visual styles.
Interactive Editing : allow users to intervene and adjust generation mid‑process.
Distributed Deployment : offload heavy image generation to multiple machines.
References
LangGraphGo GitHub – https://github.com/smallnest/langgraphgo
GoSkills v0.6.1+ – https://github.com/smallnest/goskills
LangChain Chinese documentation – https://www.langchain.com.cn/
DeepSeek API documentation – https://platform.deepseek.com/api-docs/
langgraphgo comic skill example – https://github.com/smallnest/langgraphgo/tree/master/examples/comic_skill_example
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
