Boosting LLM Function Call: Data, Training, and Agent Optimization Strategies
This presentation by Yao Yitong of China Telecom AI Research Institute explains why Function Call is essential for LLM deployment, outlines data‑centric and training‑centric optimization methods, discusses common pitfalls and reward‑function design for reinforcement learning, and showcases practical Agent application patterns for real‑world tasks.
01 Overview
Yao Yitong, an algorithm engineer at China Telecom Artificial Intelligence Research Institute, has deeply participated in the development of TeleChat large‑model chat capabilities and post‑training optimizations. This talk introduces how to improve the fundamental Function Call ability of LLMs, systematically covering optimization from both data and training perspectives, and analyzing common challenges and solutions.
Main Topics
Why Function Call is the key to LLM deployment
Core algorithm optimization – data
Core algorithm optimization – training
Agent application solutions
Why Function Call Is Critical
Function Call enables a large model to output tool‑invocation commands in a predefined JSON format, specifying the tool name and required parameters. By parsing this JSON, external frameworks can execute the corresponding tool, turning a pure text model into an executable engine. This upgrades LLMs by overcoming static knowledge limits, allowing dynamic API calls (e.g., search, finance data) and building complete automation pipelines (e.g., booking a flight).
Challenges in Real‑World Use
Common issues include parameter errors (e.g., requesting a Beijing ticket while the model passes Beijing parameters for a Shanghai request), hallucinated APIs, and tool‑dependency ordering problems. These errors highlight the need for robust data and training strategies.
Data‑Centric Optimization
Function Call data is more complex than ordinary QA data. It must capture user intent, tool selection, parameter extraction, and call ordering. Data can be categorized as:
Successful tool calls (further split into single‑tool, dependent‑tool, and parallel‑tool calls)
Unsuccessful calls (information‑missing or tool‑missing scenarios)
Non‑tool calls (pure text generation, e.g., storytelling)
Construction steps include:
Tool construction (real APIs vs. fictional tools)
Task construction (generating user queries based on tool lists, ensuring coverage of diverse scenarios)
Answer construction (high‑quality answers via model generation or human annotation, with optional multi‑source aggregation)
Validation (format and content checks, ensuring JSON compliance and correct tool/parameter usage)
Tool graphs can be built to represent dependencies, enabling difficulty‑based sampling for more challenging tasks.
Training‑Centric Optimization
LLM training typically consists of pre‑training, followed by supervised fine‑tuning (SFT) and reinforcement learning (RL). For Function Call, SFT should inject large amounts of high‑quality Function Call data, balancing the proportion of tool‑call versus non‑tool data to avoid over‑calling. RL can further refine the model using reward functions that consider output format correctness, tool selection accuracy, and parameter matching. Reward design may be strict (exact match), relaxed (partial overlap scoring), or model‑based (using a judge model).
Key RL challenges include:
Complex scenarios with single, dependent, or parallel tool calls
Scarcity of high‑quality Function Call datasets with clear reference answers
Designing reward functions that handle multi‑turn interactions and nested tool calls
Two RL approaches are discussed:
Optimizing single‑step tool calls by selecting high‑quality data, filtering for standard answers and difficulty distribution, and constructing precise reward signals.
Optimizing multi‑turn Agent interactions by integrating environment feedback into the reward loop, though this requires building stable interaction environments (e.g., code sandboxes, search indexes).
Evaluation and Iteration
Two benchmark suites are highlighted:
BFCL (Berkeley Function Call Leaderboard) – includes single‑turn, multi‑turn, parallel, and hallucination tests.
Tao‑Bench – a more challenging suite with retail and airline scenarios, emphasizing human‑interaction constraints and complex tool usage.
These benchmarks guide iterative improvements.
Agent Application Solutions
Modern Agent systems (e.g., Mini‑Max, ByteDance products) follow a hierarchical design: a planner receives the user query, decomposes it into sub‑tasks, dispatches specialized sub‑agents (research, code, etc.), aggregates results, and decides whether the overall task is complete. Effective Agent design requires:
Context engineering – selecting and preserving useful information across turns.
Robust prompt design for each node.
Strong planning, tool‑calling, coding, and long‑context understanding capabilities.
High‑quality models must excel in planning, Function Call, code generation, and long‑document comprehension to deliver reliable Agent experiences.
Conclusion
Improving LLM Function Call capability involves systematic data construction, careful training data balancing, sophisticated reward design for RL, and thorough evaluation using benchmarks. These advances enable powerful Agent applications that can reliably orchestrate multiple tools to solve complex real‑world problems.
tool_callSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
