Artificial Intelligence 12 min read

How to Engineer Industrial‑Scale Function Call Data for AI Agents

This article explains why handcrafted SFT data fails for function‑call agents, introduces a controllable data‑sandbox approach, details label and variable design, shows code for seed generation and full dialogue synthesis, and demonstrates how the resulting dataset improves model routing, multi‑turn handling, tool sequencing, and error resilience.

Wu Shixiong's Large Model Academy

Dec 6, 2025

How to Engineer Industrial‑Scale Function Call Data for AI Agents

Function‑call agents face several practical challenges: low coverage of handcrafted data, fragile JSON formats, and the inability to learn procedural flows. To build production‑grade agents, the training data must be highly engineered and reproducible.

Why a Data Sandbox Is Essential

Construct a controllable, extensible, and reproducible "data sandbox".

The sandbox defines every variable, label, and workflow at the data level, then automatically generates a complete, multi‑turn conversation dataset.

Three Perspectives Covered

Why function‑call data needs a sandbox

The complete label system and variables

From sandbox to real dialogues

1) Interview Question: Why Not Hand‑Write SFT Data?

Hand‑written examples suffer from:

Coverage too low – even hundreds of examples cannot span all real scenarios.

Format errors – missing JSON encoding, wrong tool_calls structure, role switches, mismatched tool_call_id, etc.

Models never learn the underlying process, only isolated snippets.

2) Data Sandbox Architecture

Using a travel‑assistant agent as an example, the sandbox contains four core workflows:

Travel planning (RAG + weather)

Navigation (map)

Hotel search (recommendation + reviews)

Chit‑chat / rejection

The sandbox must enumerate all mutable variables across these workflows.

Step 1: Define the Tag System (the soul of function calls)

Travel planning – no follow‑up (320 examples)

Travel planning – needs follow‑up (40)

Navigation – no follow‑up (80)

Navigation – needs follow‑up (16)

Hotel query – no follow‑up (160)

Hotel query – needs follow‑up (32)

Travel‑related chit‑chat (80)

Non‑travel rejection (80)

Each tag guarantees controllability and sufficient coverage.

Step 2: Define Business Variables (user profile)

Every data point includes a system prompt with user information such as name, city ID, departure date, and starting coordinates:

{
  "role": "system",
  "content": "## User Info
- Name: 吴师兄
- City ID: 1012510801
- Departure Date: 2025-12-02
- Start Coord: 100.479921,59.1237401"
}

Step 3: Query Templates (semantic perturbations)

For hotel search, 30 different phrasings are generated, e.g.:

帮我找一家上海外滩附近 2000 左右的酒店
外滩景观好的酒店推荐
我周五去上海，住两晚，预算 2000，能推荐吗
在魔都哪里住方便一点？
…

Step 4: Tool Result Variations

Tool returns empty

Returns 1 hotel

Returns 3 hotels

Returns 10 hotels

Returns invalid review

Returns error

Step 5: Dialogue Branches (follow‑up vs direct)

Missing destination → must ask

Missing dates → must ask

Complete info → direct tool chain

Data Generation Pipeline

Stage 1 – Seed Data Generation (generate_dataset.py)

class DatasetGenerator:
    def __init__(self):
        self.names = [...]
        self.cities = {...}
        self.hotel_queries = [...]
        self.travel_queries = [...]
        self.route_queries = [...]

    def generate_travel_plan_no_ask(self):
        return {
            "用户问题": random.choice(self.travel_queries),
            "用户名字": random.choice(self.names),
            "用户所处城市": city_id,
            "出发日期": date,
            "起点坐标": coord,
            "类型": "旅行规划-不需要反问",
            "是否追问": "否"
        }

Each label has a dedicated generator function, producing a total of 1,010 seed records.

Stage 2 – Convert Seeds to Full Dialogues

The core of a function‑call project is converting seed data into OpenAI‑compatible conversation JSON:

[
  {"role": "system", "content": "..."},
  {"role": "user", "content": "我要在武汉住酒店，预算200-300元"},
  {"role": "assistant", "tool_calls": [{"id": "call_9e45f8c7", "type": "function", "function": {"name": "recommend_hotels", "arguments": "{\"requirements\": \"武汉，预算200-300元\"}"}}]},
  {"role": "tool", "content": "...工具返回...", "tool_call_id": "call_9e45f8c7"},
  {"role": "assistant", "content": "已根据您的预算，为您找到以下酒店…"}
]

The generation algorithm follows these steps:

Select workflow based on label.

Determine whether a follow‑up is required.

Automatically generate the tool‑call chain.

Generate synthetic tool responses.

Generate the final natural‑language answer.

All steps are executed by code, eliminating manual dialogue writing.

What the Model Learns

Intent routing : Given "我想住两晚 2000 元内的酒店", the model selects the hotel‑search intent, detects missing dates, and follows the "need follow‑up" branch.

Multi‑turn follow‑up : The assistant asks for missing dates and processes the user’s answer.

Tool sequencing : For hotel search, the model calls recommend_hotels, then get_hotel_reviews, and finally produces the answer.

Handling empty tool results : The sandbox includes scenarios where a tool returns nothing, teaching the model to reply with a graceful apology.

Rejecting out‑of‑domain queries : Non‑travel questions trigger a domain‑specific refusal.

Ensuring Coverage and Quality

Because every variable and branch is defined in the sandbox, the dataset achieves:

High controllability – each conversation’s logic is explicit.

Full coverage – every label has enough examples.

Reproducibility – the same code can regenerate the entire set.

This engineered data dramatically improves the success rate of function‑call agents in real‑world deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt Design AI-agent function-call data-engineering synthetic-data

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.