How to Make LLM Agents’ Function Calls Stable and Accurate: 5 Proven Strategies
This article breaks down why function‑call reliability is the biggest bottleneck for LLM agents and presents a systematic five‑step loop—schema quality, prompt context, sampling, training data, and runtime defenses—plus concrete optimization techniques such as dynamic tool routing, plan‑execute, validation layers, memory injection, and log‑driven tuning, illustrated with real‑world cases.
Why Function Calls Need Systematic Optimization
In real projects, about 70% of an agent’s reliability issues stem from incorrect tool usage. Accurate function calls depend on five key factors:
Schema definition quality (unambiguous function and parameter names)
Prompt context (clean system and user messages)
Sampling strategy (temperature too high leads to random tool selection)
Training data & model capability (covered in a separate article)
Runtime defense mechanisms (retry, reflection, JSON validation)
This article focuses on factors 1, 2, 3, and 5.
Systematic Optimization Methods
1) Dynamic Tool Routing
When many tools are exposed, the model wastes tokens, expands decision space, and raises error probability. A lightweight intent classifier selects a subset of relevant tools before the model generates a call.
Example: [get_weather] Result: token usage drops ~80% and error rate falls sharply.
2) CoT + Plan‑Execute
For complex tasks, force the model to output Thought and Plan before executing actions, then iterate with Observation and Correction (ReAct).
Example workflow:
Step 1: Search flights
Step 2: Book flight
Step 3: Search hotels
Step 4: Book hotel
If an API returns error: Payment Failed, the model self‑corrects with a fallback payment method.
3) Validation Layer
A validation layer checks both the model’s generated calls and the API responses. It performs parameter checks, schema validation, error‑type classification, and retries or reflection as needed.
Example of missing parameter:
search_flights(destination="北京", date="明天")origin missing, please provide.
The model then regenerates the correct call.
When API returns a wrong type (e.g., temperature as a string instead of a number), the layer can auto‑clean, retry, or ask the model to reflect.
4) Long‑Term Memory & Prompt Variable Injection
In multi‑turn dialogs, store key information in a memory store and inject it into the system prompt each turn, so the model always has up‑to‑date context.
Current itinerary:
- Origin: 上海
- Destination: 北京
- Date: 2023‑10‑30The model can then call:
search_flights(origin="上海", destination="北京", date="2023-10-30")This turns the agent into a “brain‑enabled” system.
5) Log‑Driven Optimization
Collect logs for each interaction (user query, chosen tool, parameters, error type, LLM thought). Cluster bad cases to identify high‑frequency failure modes and iteratively improve schema, prompts, or routing.
30% of errors: date parsing
20%: ambiguous hotel location
10%: tool name confusion
5%: model hallucination
Closing the loop with logs yields systematic, measurable improvements.
Three Real‑World Cases
Case 1: Dynamic Routing Reduces Errors by 80%
User asks “What’s the weather in Beijing tomorrow?” Instead of exposing all six tools, the intent classifier selects only get_weather, and the model outputs the correct call directly. get_weather(city="北京", date="明天") Stability improves dramatically.
Case 2: CoT + Plan‑Execute Handles Complex Booking
For “Book a flight and a hotel,” the model first plans the sequence, then executes each step, using ReAct to recover from a payment failure.
“Payment failed, trying an alternative method.”
This self‑correction is essential for reliable agents.
Case 3: Validation Layer Cuts “Hallucination” Errors by 90%
When the model omits the required origin parameter, the validation layer rejects the call and prompts the model to add the missing field, reducing error rates by half.
origin missing, please provide.
Conclusion: Systemic Thinking Determines Stability
A robust agent architecture combines clean schema, concise prompts, intelligent routing, step‑by‑step planning, strict validation, log‑driven iteration, and memory injection. Mastering these systematic practices, rather than memorizing isolated tricks, is what separates production‑grade agents from experimental prototypes.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
