Artificial Intelligence 36 min read

Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies

This presentation by Yao Yitong of China Telecom AI Research Institute explains why Function Call is essential for LLM deployment, outlines data‑centric and training‑centric optimization methods, discusses common pitfalls and reward‑function design for reinforcement learning, and showcases practical Agent application patterns for real‑world tasks.

DataFunSummit

Sep 18, 2025

Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies

01 Overview

Yao Yitong, an algorithm engineer at China Telecom Artificial Intelligence Research Institute, has deeply participated in the development of TeleChat large‑model chat capabilities and post‑training optimizations. This talk introduces how to improve the fundamental Function Call ability of LLMs, systematically covering optimization from both data and training perspectives, and analyzing common challenges and solutions.

Main Topics

Why Function Call is the key to LLM deployment

Core algorithm optimization – data

Core algorithm optimization – training

Agent application solutions

Why Function Call Is Critical

Function Call enables a large model to output tool‑invocation commands in a predefined JSON format, specifying the tool name and required parameters. By parsing this JSON, external frameworks can execute the corresponding tool, turning a pure text model into an executable engine. This upgrades LLMs by overcoming static knowledge limits, allowing dynamic API calls (e.g., search, finance data) and building complete automation pipelines (e.g., booking a flight).

Challenges in Real‑World Use

Common issues include parameter errors (e.g., requesting a Beijing ticket while the model passes Beijing parameters for a Shanghai request), hallucinated APIs, and tool‑dependency ordering problems. These errors highlight the need for robust data and training strategies.

Data‑Centric Optimization

Function Call data is more complex than ordinary QA data. It must capture user intent, tool selection, parameter extraction, and call ordering. Data can be categorized as:

Successful tool calls (further split into single‑tool, dependent‑tool, and parallel‑tool calls)

Unsuccessful calls (information‑missing or tool‑missing scenarios)

Non‑tool calls (pure text generation, e.g., storytelling)

Construction steps include:

Tool construction (real APIs vs. fictional tools)

Task construction (generating user queries based on tool lists, ensuring coverage of diverse scenarios)

Answer construction (high‑quality answers via model generation or human annotation, with optional multi‑source aggregation)

Validation (format and content checks, ensuring JSON compliance and correct tool/parameter usage)

Tool graphs can be built to represent dependencies, enabling difficulty‑based sampling for more challenging tasks.

Training‑Centric Optimization

LLM training typically consists of pre‑training, followed by supervised fine‑tuning (SFT) and reinforcement learning (RL). For Function Call, SFT should inject large amounts of high‑quality Function Call data, balancing the proportion of tool‑call versus non‑tool data to avoid over‑calling. RL can further refine the model using reward functions that consider output format correctness, tool selection accuracy, and parameter matching. Reward design may be strict (exact match), relaxed (partial overlap scoring), or model‑based (using a judge model).

Key RL challenges include:

Complex scenarios with single, dependent, or parallel tool calls

Scarcity of high‑quality Function Call datasets with clear reference answers

Designing reward functions that handle multi‑turn interactions and nested tool calls

Two RL approaches are discussed:

Optimizing single‑step tool calls by selecting high‑quality data, filtering for standard answers and difficulty distribution, and constructing precise reward signals.

Optimizing multi‑turn Agent interactions by integrating environment feedback into the reward loop, though this requires building stable interaction environments (e.g., code sandboxes, search indexes).

Evaluation and Iteration

Two benchmark suites are highlighted:

BFCL (Berkeley Function Call Leaderboard) – includes single‑turn, multi‑turn, parallel, and hallucination tests.

Tao‑Bench – a more challenging suite with retail and airline scenarios, emphasizing human‑interaction constraints and complex tool usage.

These benchmarks guide iterative improvements.

Agent Application Solutions

Modern Agent systems (e.g., Mini‑Max, ByteDance products) follow a hierarchical design: a planner receives the user query, decomposes it into sub‑tasks, dispatches specialized sub‑agents (research, code, etc.), aggregates results, and decides whether the overall task is complete. Effective Agent design requires:

Context engineering – selecting and preserving useful information across turns.

Robust prompt design for each node.

Strong planning, tool‑calling, coding, and long‑context understanding capabilities.

High‑quality models must excel in planning, Function Call, code generation, and long‑document comprehension to deliver reliable Agent experiences.

Conclusion

Improving LLM Function Call capability involves systematic data construction, careful training data balancing, sophisticated reward design for RL, and thorough evaluation using benchmarks. These advances enable powerful Agent applications that can reliably orchestrate multiple tools to solve complex real‑world problems.

tool_call

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Agent reinforcement learning Training Optimization function call Data Construction

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.