Why Function Calling Fails in AI Agents and How to Fix It
The article examines the three main challenges of complex AI projects—knowledge organization, data‑AI interaction, and intent recognition—highlighting why tool (function) calling often breaks, and presents practical engineering strategies such as prompt refinement, clearer tool schemas, intent convergence, and evaluation loops to improve reliability.
Complex AI projects face three major hurdles: organizing knowledge and data, ensuring reliable data‑AI interaction, and accurately recognizing user intent. The third hurdle—intent recognition—frequently leads to chaotic tool (function) calling, as illustrated by real‑world failures in student agent projects.
Agent's Core: Function Calling
Current large‑language models expose a single API with input and output, but meaningful responses require embedding extensive domain knowledge. Agents like Manus use function calling to retrieve external data, e.g., weather information.
tools = [{
"type": "function",
"name": "get_weather",
"description": "Retrieves current weather for the given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
// ...
}
}
}]
response = client.responses.create(
model="gpt-5",
tools=tools,
input="今天成都的天气怎么样"
)The model selects a tool by matching the user query against each tool's description and parameter schema.
# User input
user_query = "今天北京天气怎么样?"
# Model analysis
# - "天气" matches get_weather description
# - "北京" maps to location parameter
# -> Call get_weatherTool selection is a black box: we cannot directly intervene in the decision of whether to call a tool, which tool to use, or how to fill its parameters.
In production, multiple tools and complex conversational context increase the risk of missed calls, wrong calls, and parameter extraction errors.
Intent Convergence
To reduce errors, first structure the user request into a clear intent before tool selection. Example transformation:
{
"task_type": "check_weather",
"city": "北京",
"need_flight_info": true
}Downstream logic then chooses the appropriate tool set based on task_type, applying the single‑responsibility principle to keep each tool focused on a single task.
Tool Convergence
Three practical tactics:
Single‑Responsibility Principle : avoid multi‑purpose tools.
# ❌ Bad description
"description": "获取天气和航班信息"
# ✅ Good description
"description": "获取指定城市的实时天气信息"Scene‑Based Tool Packages : load only the tools relevant to the detected intent.
weather_tools = [get_weather]
order_tools = [query_order, cancel_order]
# Attach the appropriate package per requestClear Descriptions & Names : ensure the model can read and understand when a tool should be used.
# Bad description
"description": "Retrieves current weather for the given location."
# Good description (Chinese example retained for clarity)
"description": "获取指定城市的实时天气信息。当用户询问当前温度、湿度、风速时使用。不适用于查询历史天气或气候特征。"Tool Evaluation Set
Every agent needs a systematic test set to catch failures. Build it by logging each call and performing human review:
log_data = {
"user_input": "今天北京天气",
"model_tool_call": "get_weather",
"model_arguments": {"location": "北京"},
"tool_result": {"temperature": 25},
"final_response": "北京今天25度,晴天",
"success": true
}Review samples for three questions: should a tool be called? Which tool? Are parameters correct? Aggregate metrics such as miss‑call rate, wrong‑call rate, and parameter error rate, then iterate prompts or tool definitions accordingly.
Skills Strategy
Claude’s “Skills” approach routes the user query first to a high‑level skill (coarse intent) and then executes a small, well‑defined tool set within that skill, reducing the chance of random tool selection.
# Before Skills
User → Model scans dozens of tools → Miss/Wrong/No‑call
# After Skills
User → Skill selector → Limited tool set + SOPSkills help with tool selection, timing, and post‑call data handling, though they do not solve vague user intents or poorly designed schemas.
Conclusion
Most production AI agents fail because intent recognition collapses, leading to chaotic tool calls. Engineering solutions include intent convergence, tool convergence, strict single‑responsibility design, detailed logging, evaluation loops, and optionally a Skills layer. When these measures still fall short, consider model upgrades or more sophisticated context‑providing techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
