Artificial Intelligence 12 min read

How to Make LLM Agents’ Function Calls Stable and Accurate: 5 Proven Strategies

This article breaks down why function‑call reliability is the biggest bottleneck for LLM agents and presents a systematic five‑step loop—schema quality, prompt context, sampling, training data, and runtime defenses—plus concrete optimization techniques such as dynamic tool routing, plan‑execute, validation layers, memory injection, and log‑driven tuning, illustrated with real‑world cases.

Wu Shixiong's Large Model Academy

Nov 18, 2025

How to Make LLM Agents’ Function Calls Stable and Accurate: 5 Proven Strategies

Why Function Calls Need Systematic Optimization

In real projects, about 70% of an agent’s reliability issues stem from incorrect tool usage. Accurate function calls depend on five key factors:

Schema definition quality (unambiguous function and parameter names)

Prompt context (clean system and user messages)

Sampling strategy (temperature too high leads to random tool selection)

Training data & model capability (covered in a separate article)

Runtime defense mechanisms (retry, reflection, JSON validation)

This article focuses on factors 1, 2, 3, and 5.

Systematic Optimization Methods

1) Dynamic Tool Routing

When many tools are exposed, the model wastes tokens, expands decision space, and raises error probability. A lightweight intent classifier selects a subset of relevant tools before the model generates a call.

Example: [get_weather] Result: token usage drops ~80% and error rate falls sharply.

2) CoT + Plan‑Execute

For complex tasks, force the model to output Thought and Plan before executing actions, then iterate with Observation and Correction (ReAct).

Example workflow:

Step 1: Search flights

Step 2: Book flight

Step 3: Search hotels

Step 4: Book hotel

If an API returns error: Payment Failed, the model self‑corrects with a fallback payment method.

3) Validation Layer

A validation layer checks both the model’s generated calls and the API responses. It performs parameter checks, schema validation, error‑type classification, and retries or reflection as needed.

Example of missing parameter:

search_flights(destination="北京", date="明天")

origin missing, please provide.

The model then regenerates the correct call.

When API returns a wrong type (e.g., temperature as a string instead of a number), the layer can auto‑clean, retry, or ask the model to reflect.

4) Long‑Term Memory & Prompt Variable Injection

In multi‑turn dialogs, store key information in a memory store and inject it into the system prompt each turn, so the model always has up‑to‑date context.

Current itinerary:
- Origin: 上海
- Destination: 北京
- Date: 2023‑10‑30

The model can then call:

search_flights(origin="上海", destination="北京", date="2023-10-30")

This turns the agent into a “brain‑enabled” system.

5) Log‑Driven Optimization

Collect logs for each interaction (user query, chosen tool, parameters, error type, LLM thought). Cluster bad cases to identify high‑frequency failure modes and iteratively improve schema, prompts, or routing.

30% of errors: date parsing

20%: ambiguous hotel location

10%: tool name confusion

5%: model hallucination

Closing the loop with logs yields systematic, measurable improvements.

Three Real‑World Cases

Case 1: Dynamic Routing Reduces Errors by 80%

User asks “What’s the weather in Beijing tomorrow?” Instead of exposing all six tools, the intent classifier selects only get_weather, and the model outputs the correct call directly. get_weather(city="北京", date="明天") Stability improves dramatically.

Case 2: CoT + Plan‑Execute Handles Complex Booking

For “Book a flight and a hotel,” the model first plans the sequence, then executes each step, using ReAct to recover from a payment failure.

“Payment failed, trying an alternative method.”

This self‑correction is essential for reliable agents.

Case 3: Validation Layer Cuts “Hallucination” Errors by 90%

When the model omits the required origin parameter, the validation layer rejects the call and prompts the model to add the missing field, reducing error rates by half.

origin missing, please provide.

Conclusion: Systemic Thinking Determines Stability

A robust agent architecture combines clean schema, concise prompts, intelligent routing, step‑by‑step planning, strict validation, log‑driven iteration, and memory injection. Mastering these systematic practices, rather than memorizing isolated tricks, is what separates production‑grade agents from experimental prototypes.

LLM Validation Agent log analysis Function Call Tool Routing

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why Function Calls Need Systematic Optimization

Systematic Optimization Methods

1) Dynamic Tool Routing

2) CoT + Plan‑Execute

3) Validation Layer

4) Long‑Term Memory & Prompt Variable Injection

5) Log‑Driven Optimization

Three Real‑World Cases

Case 1: Dynamic Routing Reduces Errors by 80%

Case 2: CoT + Plan‑Execute Handles Complex Booking

Case 3: Validation Layer Cuts “Hallucination” Errors by 90%

Conclusion: Systemic Thinking Determines Stability

Wu Shixiong's Large Model Academy

How this landed with the community

Was this worth your time?

0 Comments

2) CoT + Plan‑Execute

Case 1: Dynamic Routing Reduces Errors by 80%

Case 2: CoT + Plan‑Execute Handles Complex Booking

Case 3: Validation Layer Cuts “Hallucination” Errors by 90%