How Spring AI’s Dynamic Tool Discovery Cuts Token Usage by 34%‑64%

The article explains how Spring AI’s recursive advisors enable dynamic tool discovery, replacing the traditional all‑tools‑in‑prompt approach, thereby reducing token consumption by 34%‑64% while preserving access to hundreds of tools, and provides benchmark data, code examples, and configurable search strategies.

Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
How Spring AI’s Dynamic Tool Discovery Cuts Token Usage by 34%‑64%

1. Introduction

As AI agents connect to an increasing number of services (Slack, GitHub, Jira, Minecraft servers, etc.), the tool library grows rapidly. A typical multi‑server setup can contain more than 50 tools before a conversation starts, consuming over 55,000 tokens, and the model’s accuracy drops when faced with dozens of similarly named tools.

Anthropic’s Tool Search Tool (TST) pattern solves this by letting the model discover tools on demand instead of loading all definitions at once. Spring AI can replicate this behavior for any LLM using its Recursive Advisors abstraction.

2. Core Idea

Spring AI provides a portable abstraction layer that makes dynamic tool discovery work with OpenAI, Anthropic, Gemini, Ollama, Azure OpenAI, and other supported providers. Preliminary benchmarks show a 34%‑64% reduction in token usage on OpenAI, Anthropic, and Gemini models while still accessing hundreds of tools.

3. How the Tool Search Tool Works

The standard ToolCallAdvisor sends every registered tool definition to the LLM, causing three problems:

Context bloat : massive token consumption before the dialogue begins.

Tool confusion : the model struggles to pick the correct tool among many similar ones.

Higher cost : tokens are spent on unused tool definitions.

Spring AI extends ToolCallAdvisor with ToolSearchToolCallAdvisor . The advisor intercepts the tool‑calling loop, queries a ToolSearcher when the model needs a capability, and injects only the matching tool definitions into the next request.

Workflow

Build index : At conversation start, all registered tools are indexed in a ToolSearcher but not sent to the LLM.

Initial request : Only the TST definition is sent, saving context.

Discovery call : When the LLM needs a specific ability, it calls the TST with a search query.

Search & expand : The ToolSearcher finds matching tools (e.g., “Tool XYZ”) and adds their definitions to the next request.

Tool execution : The LLM can now call the discovered tool; the result is returned to the model.

Generate response : The LLM produces the final answer using the tool output.

4. Code Example

var toolSearchToolCallAdvisor = ToolSearchToolCallAdvisor.builder()
    .toolSearcher(toolSearcher)
    .maxResults(5)
    .build();

ChatClient chatClient = chatClientBuilder
    // Hundreds of tools are registered but not sent initially
    .defaultTools(new MyTools(), ...)
    // Activate the Tool Search Tool
    .defaultAdvisors(toolSearchToolCallAdvisor)
    .build();

Pluggable Search Strategies

VectorToolSearcher – semantic search using vector stores (natural‑language queries, fuzzy matching).

LuceneToolSearcher – keyword search for exact matches and known tool names.

RegexToolSearcher – regular‑expression matching for tool name patterns (e.g., get_*_data).

5. Practical Example

5.1 Dependencies

<dependency>
  <groupId>org.springaicommunity</groupId>
  <artifactId>tool-search-tool</artifactId>
  <version>1.0.1</version>
</dependency>

<dependency>
  <groupId>org.springaicommunity</groupId>
  <artifactId>tool-searcher-lucene</artifactId>
  <version>1.0.1</version>
</dependency>

Version 1.0.x works with Spring AI 1.1.x / Spring Boot 3; version 2.0.x works with Spring AI 2.x / Spring Boot 4.

5.2 Tool Definitions

public class MyTools {
  @Tool(description = "Get weather for a location at a specific time")
  public String weather(@ToolParam(description = "Location") String location,
                        @ToolParam(description = "YYYY-MM-DD") String atTime) {
    System.err.println("%s, %s 天气情况:xxx".formatted(location, atTime));
    return "32℃,天气晴朗,紫外线弱";
  }

  @Tool(description = "Get currently open clothing stores in a location")
  public List<String> clothing(@ToolParam(description = "Location") String location) {
    return List.of("劲霸男装", "恒源祥服装");
  }
  // ...more tools
}

5.3 Configuration

@Configuration
public class ToolsConfig {
  @Bean
  ToolSearcher toolSearcher() {
    return new LuceneToolSearcher();
  }
}

5.4 Runner

@Component
public class DynamicToolRunner implements CommandLineRunner {
  private final ChatClient chatClient;

  public DynamicToolRunner(ChatClient.Builder builder, ToolSearcher toolSearcher) {
    var advisor = ToolSearchToolCallAdvisor.builder()
        .toolSearcher(toolSearcher)
        .build();
    this.chatClient = builder.defaultTools(new MyTools())
        .defaultAdvisors(advisor)
        .build();
  }

  @Override
  public void run(String... args) throws Exception {
    var answer = chatClient.prompt("""
        查询成都2026-05-03的天气情况。
        """).call().content();
    System.out.println(answer);
  }
}

6. Performance Test

Disclaimer : The numbers come from a small number of manual runs and are for demonstration only; they are not statistically averaged.

Test setup:

Task: “Plan today’s outfit in Amsterdam and recommend a few open clothing stores.”

Total tools: 28 (3 relevant – weather, clothing, currentTime – plus 25 unrelated placeholder tools).

Search strategies: Lucene (keyword) and VectorStore (semantic).

Models: Gemini‑3‑pro‑preview, OpenAI gpt‑5‑mini‑2025‑08‑07, Anthropic claude‑sonnet‑4‑5‑20250929.

Token usage was measured with a custom TokenCounterAdvisor that aggregates token counts.

The benchmark shows a consistent 34%‑64% reduction in token consumption across the three models while still being able to invoke any of the hundreds of registered tools.

7. Conclusion

By leveraging Spring AI’s recursive advisors and a pluggable ToolSearcher, developers can implement dynamic tool discovery that dramatically cuts token usage, avoids context bloat, reduces cost, and improves tool‑selection accuracy in LLM‑driven applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaLLMSpring BootSpring AItoken optimizationTool SearchDynamic Tool Discovery
Spring Full-Stack Practical Cases
Written by

Spring Full-Stack Practical Cases

Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.