How to Build an AI Comic‑Generating Agent with LangGraphGo and Skills

This article walks through constructing a multi‑step AI comic‑generation agent using the LangGraphGo framework and the GoSkills plugin system, covering architecture design, declarative tool definitions, automatic configuration discovery, parameter conversion, code implementation, common pitfalls, best practices, and performance optimizations.

BirdNest Tech Talk
BirdNest Tech Talk
BirdNest Tech Talk
How to Build an AI Comic‑Generating Agent with LangGraphGo and Skills

Introduction

In AI application development, agent architectures are becoming increasingly important because they can plan execution steps, invoke external tools, and dynamically adjust strategies based on results. This guide demonstrates how to build a comic‑generation agent that creates storyboard scripts, generates images for each page, and merges them into a PDF.

Technology Stack Overview

The core components are:

LangGraphGo : a Go implementation of the LangGraph framework providing state‑graph capabilities.

GoSkills v0.6.1+ : a plugin system that wraps scripts as LLM‑callable tools.

TypeScript scripts : executed with npx tsx for storyboard and image generation.

ERNIE 5.0 Thinking Preview : Baidu's large language model used for planning and tool orchestration.

Technology Stack Diagram
Technology Stack Diagram

Project Architecture

The repository layout is:

comic_skill_example/
├── main.go               # Entry point, creates and runs the Agent
├── go.mod                # Go module dependencies
└── skills/               # Directory of skill packages
    ├── baoyu-comic/      # Storyboard generation skill
    │   ├── SKILL.md
    │   └── scripts/
    │       ├── generate-comic.ts
    │       └── merge-to-pdf.ts
    ├── baoyu-image-gen/  # Image generation skill
    │   ├── SKILL.md
    │   └── scripts/
    │       └── main.ts
    └── pdf/               # PDF processing skill (Python)
        ├── SKILL.md
        └── scripts/
            ├── check_bounding_boxes.py
            ├── convert_pdf_to_images.py
            └── ...

The system automatically loads every skill package under skills/, but the comic agent only uses three core tools: generate_comic_storyboard, generate_comic_image, and merge_comic_to_pdf, which come from the baoyu-comic and baoyu-image-gen packages.

Skill Definition System (SKILL.md)

Each skill is described in a SKILL.md file using YAML front‑matter. The file declares the tool name, script path, description, and parameters. For example, the storyboard skill defines a topic string (required), an optional style, pages integer, and aspect string.

---
name: baoyu-comic
description: Knowledge comic creator supporting multiple styles
tools:
- name: generate_comic_storyboard
  script: scripts/generate-comic.ts
  description: Create a complete storyboard and prompts
  parameters:
    topic:
      type: string
      description: Comic theme
      required: true
    style:
      type: string
      description: Visual style (e.g., warm, classic)
      required: false
    pages:
      type: integer
      description: Number of pages
      required: false
    aspect:
      type: string
      description: Aspect ratio (e.g., 3:4, 16:9)
      required: false
---

Design Highlights

Declarative tool definition – all tool metadata lives in SKILL.md instead of hard‑coded Go.

Automatic OpenAPI schema generation – the system builds JSON schemas from the parameter definitions.

Zero Go code changes – adding a new tool only requires editing SKILL.md and adding the script.

Core Implementation Details

The adapter adapter/goskills/goskills.go reads SKILL.md and builds a ToolConfig structure. The previous hard‑coded approach required over 60 lines of Go; the new auto‑discovery approach needs zero lines of manual configuration.

// buildToolConfigFromSkill reads a SKILL.md file and constructs a ToolConfig
func buildToolConfigFromSkill(skill *goskills.SkillPackage) *ToolConfig {
    if len(skill.Meta.Tools) == 0 {
        return nil
    }
    config := &ToolConfig{
        NameMapping:          make(map[string]string),
        DescriptionOverrides: make(map[string]string),
        SchemaOverrides:      make(map[string]map[string]any),
    }
    for _, toolDef := range skill.Meta.Tools {
        // name mapping
        config.NameMapping[toolDef.Name] = toolDef.Name
        // description
        if toolDef.Description != "" {
            config.DescriptionOverrides[toolDef.Name] = toolDef.Description
        }
        // schema generation
        schema := map[string]any{"type": "object", "properties": make(map[string]any)}
        var required []string
        for paramName, param := range toolDef.Parameters {
            prop := map[string]any{"type": param.Type}
            if param.Description != "" {
                prop["description"] = param.Description
            }
            schema["properties"].(map[string]any)[paramName] = prop
            if param.Required {
                required = append(required, paramName)
            }
        }
        if len(required) > 0 {
            schema["required"] = required
        }
        schema["additionalProperties"] = false
        config.SchemaOverrides[toolDef.Name] = schema
    }
    return config
}

// SkillsToTools converts a SkillPackage into a slice of tools, using the auto‑generated config.
func SkillsToTools(skill *goskills.SkillPackage, opts ...SkillsToToolsOptions) ([]tools.Tool, error) {
    var config *ToolConfig
    // 1. Try auto‑building from SKILL.md
    skillConfig := buildToolConfigFromSkill(skill)
    if skillConfig != nil {
        config = skillConfig
    }
    // 2. Merge user‑provided overrides if any
    if len(opts) > 0 && opts[0].ToolConfig != nil {
        // merge logic …
    }
    // 3. Generate tools …
    return nil, nil
}

Advantages comparison

Code size: hard‑coded approach ~60 lines vs. auto‑discovery 0 lines.

Maintenance cost: high (duplicate edits) vs. low (single source).

Extensibility: requires recompilation vs. no Go changes.

Type safety: compile‑time checks vs. runtime checks.

Parameter Conversion (JSON → CLI)

LLM output is JSON, but the TypeScript scripts expect command‑line flags. The adapter converts named parameters to flags according to a mapping table.

{
  "topic": "采蘑菇的小姑娘",
  "style": "warm",
  "pages": 1
}
npx tsx generate-comic.ts --topic "采蘑菇的小姑娘" --style "warm" --pages 1
func (t *SkillTool) Call(ctx context.Context, input string) (string, error) {
    // Parse JSON
    var namedParams map[string]any
    err := json.Unmarshal([]byte(input), &namedParams)
    if err == nil && len(namedParams) > 0 {
        // Convert to CLI args
        paramMapping := map[string]string{
            "topic": "--topic",
            "style": "--style",
            "pages": "--pages",
            "aspect": "--aspect",
            "path": "--image",
            "prompt": "--prompt",
            "ar": "--ar",
            "quality": "--quality",
            "directory": "--directory",
        }
        paramOrder := []string{"topic", "style", "pages", "aspect", "path", "prompt", "ar", "quality", "directory"}
        var args []string
        for _, key := range paramOrder {
            if value, ok := namedParams[key]; ok && value != nil {
                if flag, ok := paramMapping[key]; ok {
                    args = append(args, flag, fmt.Sprintf("%v", value))
                }
            }
        }
        // Execute script based on extension
        if strings.HasSuffix(scriptPath, ".py") {
            return goskills.RunPythonScript(scriptPath, args)
        } else if strings.HasSuffix(scriptPath, ".ts") || strings.HasSuffix(scriptPath, ".js") {
            return langgraphtool.RunTypeScriptScript(scriptPath, args)
        }
        return langgraphtool.RunShellScript(scriptPath, args)
    }
    // Fallback to old args format …
    return "", fmt.Errorf("unsupported input format")
}

TypeScript Execution Layer

// RunTypeScriptScript executes a .ts or .js file via npx tsx without compilation.
func RunTypeScriptScript(scriptPath string, args []string) (string, error) {
    cmdArgs := append([]string{"tsx", scriptPath}, args...)
    cmd := exec.Command("npx", cmdArgs...)
    var stdout, stderr bytes.Buffer
    cmd.Stdout = &stdout
    cmd.Stderr = &stderr
    if err := cmd.Run(); err != nil {
        return "", fmt.Errorf("failed to run typescript script: %w
Stdout: %s
Stderr: %s", err, stdout.String(), stderr.String())
    }
    return stdout.String() + stderr.String(), nil
}

Why Choose tsx ?

✅ No pre‑compilation, fast development.

✅ Supports TypeScript and ESM.

✅ Fully compatible with the Node.js ecosystem.

✅ Allows use of the latest JavaScript syntax.

Supported Script Types

TypeScript (.ts) : business logic and image generation – executed with npx tsx script.ts.

JavaScript (.js) : simple scripts – executed with npx tsx script.js.

Python (.py) : data processing and PDF handling – executed with python script.py.

Shell (.sh) : system operations – executed with bash script.sh.

Complete Workflow

The agent receives a user request, plans the steps, calls the storyboard tool, then the image generation tool for each page, and finally merges the images into a PDF.

User Input
  │
  ▼
┌───────────────────────────────────────┐
│ Agent Node: LLM planning + tool calls │
│ Input: user request + tool definitions│
│ Output: structured tool calls        │
└─────────────────────┬─────────────────┘
                      │
                      ▼
                ┌───────────────┐
                │ Tools Node   │
                └───────────────┘
                      │
          ┌───────────┴───────────┐
          ▼                       ▼
  Storyboard generation   Image generation
  (generate-comic.ts)    (main.ts)
          │                       │
          ▼                       ▼
  Storyboard JSON          Comic images
          │                       │
          └───────────┬───────────┘
                      ▼
            PDF merge script
            (merge-to-pdf.ts)
                      │
                      ▼
               Complete PDF comic

Main Program Code (main.go)

// main.go
package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "strings"

    "github.com/smallnest/goskills"
    adapter "github.com/smallnest/langgraphgo/adapter/goskills"
    "github.com/smallnest/langgraphgo/prebuilt"
    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/openai"
    "github.com/tmc/langchaingo/tools"
)

func main() {
    // 1. Initialize LLM (recommend ERNIE 5.0 Thinking Preview)
    // Set environment variables: OPENAI_API_KEY and OPENAI_BASE_URL accordingly.
    llm, err := openai.New()
    if err != nil { log.Fatal(err) }

    // 2. Load skill packages from the skills directory
    skillsDir := "./skills"
    if _, err := os.Stat(skillsDir); os.IsNotExist(err) {
        skillsDir = "comic_skill_example/skills"
    }
    packages, err := goskills.ParseSkillPackages(skillsDir)
    if err != nil { log.Fatalf("Failed to parse skill packages: %v", err) }
    if len(packages) == 0 { log.Fatal("No skills found in " + skillsDir) }

    // 3. Convert skills to tools (auto‑read SKILL.md)
    var allTools []tools.Tool
    for _, skill := range packages {
        fmt.Printf("Loading skill: %s - %s
", skill.Meta.Name, skill.Meta.Description)
        skillTools, err := adapter.SkillsToTools(skill)
        if err != nil { log.Printf("Failed to convert skill %s: %v", skill.Meta.Name, err); continue }
        allTools = append(allTools, skillTools...)
    }

    // 4. Filter comic‑related tools
    var comicTools []tools.Tool
    for _, t := range allTools {
        if t.Name() == "generate_comic_storyboard" || t.Name() == "generate_comic_image" || t.Name() == "merge_comic_to_pdf" {
            comicTools = append(comicTools, t)
        }
    }

    // 5. System prompt describing the agent's responsibilities
    systemMsg := `You are a helpful assistant that can call tools to create comics. When a user asks for a comic, you must call generate_comic_storyboard first.

Available functions:
- generate_comic_storyboard: create a full storyboard and prompts
- generate_comic_image: generate a single comic page (needs prompt and path)
- merge_comic_to_pdf: merge all pages into a PDF

Workflow:
1. Call generate_comic_storyboard
2. If the output contains '=== IMAGE_GENERATION_REQUIRED ===', call generate_comic_image for each page
3. Call merge_comic_to_pdf to produce the final PDF

Always call the function; never return a textual description.`

    // 6. Create the agent with a 20‑step limit
    agent, err := prebuilt.CreateAgentMap(llm, comicTools, 20, prebuilt.WithSystemMessage(systemMsg))
    if err != nil { log.Fatal(err) }

    // 7. Invoke the agent with the user request passed as a command‑line argument
    ctx := context.Background()
    resp, err := agent.Invoke(ctx, map[string]any{"messages": []llms.MessageContent{llms.TextParts(llms.ChatMessageTypeHuman, os.Args[1])}})
    if err != nil { log.Fatal(err) }

    // 8. Print the resulting messages
    if messages, ok := resp["messages"].([]llms.MessageContent); ok {
        for _, msg := range messages {
            fmt.Printf("[%s] %s
", msg.Role, msg.Parts)
        }
    }
}

Pitfalls and Solutions

DeepSeek V3 tool‑call instability

Problem: DeepSeek V3 sometimes returns malformed tool‑call markers, causing parsing failures.

<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>generate_comic_storyboard{...}<|tool▁call▁end|>

Solution: Switch to ERNIE 5.0 Thinking Preview, which provides stable tool‑call formatting.

llm, err := openai.New(
    openai.WithToken("your-ernie-api-key"),
    openai.WithBaseURL("https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/chat/ernie-5.0-thinking-preview"),
)

TypeScript script execution issues

Problem: Bun‑specific APIs (e.g., Bun.write) are not compatible with Node.js.

Remove Bun‑only calls.

Replace with Node.js APIs such as fs.writeFileSync.

Execute scripts with npx tsx instead of bun run.

Parameter format conversion

Problem: LLM returns JSON while scripts expect CLI flags.

Solution: Implement automatic conversion in the tool execution layer (see the code in the “Parameter Conversion” section).

Chinese filename support in PDF merging

Problem: The regular expression used to match page filenames does not include Unicode Chinese characters.

// Add Unicode Chinese range
const pagePattern = /^(=+)-(cover|page)(-[\一-鿿-]+)?.(png|jpg|jpeg)$/i;

Best Practices Summary

Single Responsibility : each skill handles one concern (storyboard, image generation, PDF merging).

Declarative Configuration : define tools in SKILL.md to avoid hard‑coding.

Language Choice : use TypeScript for image‑related logic and Python for PDF handling to leverage ecosystem strengths.

Performance Optimizations

Concurrent Image Generation

// Generate all pages concurrently with a semaphore limiting concurrency to 3
var wg sync.WaitGroup
semaphore := make(chan struct{}, 3)
for _, page := range pages {
    wg.Add(1)
    go func(p Page) {
        defer wg.Done()
        semaphore <- struct{}{}
        defer func() { <-semaphore }()
        generateImage(p)
    }(page)
}
wg.Wait()

Cache Mechanism for Repeated Requests

type CacheKey struct {
    Topic string
    Style string
    Pages int
}
var storyboardCache sync.Map // map[CacheKey]StoryboardResult

Future Directions

Multimodal Input : support images or video as source material.

Style Transfer : one‑click switching of comic visual styles.

Interactive Editing : allow users to intervene and adjust generation mid‑process.

Distributed Deployment : offload heavy image generation to multiple machines.

References

LangGraphGo GitHub – https://github.com/smallnest/langgraphgo

GoSkills v0.6.1+ – https://github.com/smallnest/goskills

LangChain Chinese documentation – https://www.langchain.com.cn/

DeepSeek API documentation – https://platform.deepseek.com/api-docs/

langgraphgo comic skill example – https://github.com/smallnest/langgraphgo/tree/master/examples/comic_skill_example

Performance optimizationTypeScriptAI agentsauto-configurationLLM tool integrationcomic generationGoSkillsLangGraphGo
BirdNest Tech Talk
Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.