Turn LLMs into Real Assistants: Build a Tool‑Using Agent in Minutes

This article explains why large language models alone can hallucinate, introduces the tool‑using agent architecture, and provides a step‑by‑step Python tutorial using LangChain, LangGraph, and Tavily to create, run, and evaluate a real‑time web‑search capable AI assistant.

Data STUDIO
Data STUDIO
Data STUDIO
Turn LLMs into Real Assistants: Build a Tool‑Using Agent in Minutes

Why LLMs Need Tools

Large language models (LLMs) are powerful but act as closed inference engines whose knowledge is frozen at training time, leading to hallucinations and an inability to access up‑to‑date or private data. Giving them "hands"—the ability to invoke external tools—solves this limitation.

What Is a Tool‑Using Agent?

A tool‑using agent equips an LLM with callable functions or APIs (the "tools"). When the model cannot answer a query directly, it decides to call the most appropriate tool, receives the result, and synthesizes a final answer.

Typical Workflow

Receive task (e.g., "What did the latest WWDC announce?")

Autonomously decide a tool is needed

Invoke the tool (e.g., web_search) with a clear description

Get tool feedback (search results)

Combine the result with language generation to produce an answer

Use Cases

Research assistant – fetch latest news or papers

Enterprise assistant – query internal databases

Math & scientific computation – call a calculator tool for reliable results

Pros and Cons

Pros

Eliminates hallucinations by grounding answers in real‑time data

Extensible: add new tools like apps

Cons

Integration effort: define tools, manage API keys, handle retries

Tool quality directly impacts agent performance

Step‑by‑Step Implementation

0. Preparation

# Install required libraries
!pip install -q -U langchain-nebius langchain langgraph rich python-dotenv tavily-python

Create a .env file with your Nebius, LangSmith, and Tavily API keys and load them in Python.

import os, json
from dotenv import load_dotenv
load_dotenv()
# Verify keys
for key in ["NEBIUS_API_KEY", "LANGCHAIN_API_KEY", "TAVILY_API_KEY"]:
    if not os.getenv(key):
        print(f"⚠️ {key} missing")
print("✅ Environment loaded")

1. Define the Tool

We use TavilySearchResults for real‑time web search, limiting results to two to keep context short.

from langchain_community.tools.tavily_search import TavilySearchResults
search_tool = TavilySearchResults(max_results=2)
search_tool.name = "web_search"
search_tool.description = "Search the web for the latest information, such as news or event scores. Returns real‑time content."
tools = [search_tool]

2. Build the Agent with LangGraph

Define a state to hold the message history.

from typing import TypedDict, AnyMessage, Annotated, List
class AgentState(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]

Bind the tool to the LLM.

from langchain_nebius import ChatNebius
llm = ChatNebius(model="meta-llama/Meta-Llama-3.1-8B-Instruct", temperature=0)
llm_with_tools = llm.bind_tools(tools)

Define the brain node, tool node, and routing logic.

def agent_node(state: AgentState):
    """Brain: decide next action"""
    console.print("🤔 --- Agent thinking ... ---")
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

from langgraph.prebuilt import ToolNode
tool_node = ToolNode(tools)

def router_function(state: AgentState) -> str:
    """Router: choose between calling a tool or finishing"""
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        console.print("🔧 --- Router: call tool ---")
        return "call_tool"
    console.print("🎉 --- Router: finish, output answer ---")
    return "__end__"

Assemble the graph.

from langgraph.graph import StateGraph
graph_builder = StateGraph(AgentState)
graph_builder.add_node("agent", agent_node)
graph_builder.add_node("call_tool", tool_node)
graph_builder.set_entry_point("agent")
graph_builder.add_conditional_edges("agent", router_function)
graph_builder.add_edge("call_tool", "agent")
tool_agent_app = graph_builder.compile()

3. Run the Agent

user_query = "What were the main announcements at the latest WWDC?"
initial_input = {"messages": [("user", user_query)]}
console.print(f"🚀 Starting agent with query: '{user_query}'")
for chunk in tool_agent_app.stream(initial_input, stream_mode="values"):
    chunk["messages"][-1].pretty_print()
    console.print("
---
")
console.print("
✅ Workflow completed!")

The output shows the full reasoning trace: user input → agent decides to call web_search → tool returns structured results → agent synthesizes a concise answer.

4. Evaluation

We define a Pydantic model to score tool selection, input quality, and synthesis quality, then use a second LLM as a judge.

from pydantic import BaseModel, Field
class ToolUseEvaluation(BaseModel):
    tool_selection_score: int = Field(description="1‑5: Did the agent pick the right tool?")
    tool_input_score: int = Field(description="1‑5: Was the tool input precise?")
    synthesis_quality_score: int = Field(description="1‑5: How good is the final answer?")
    justification: str = Field(description="Reason for the scores")

judge_llm = llm.with_structured_output(ToolUseEvaluation)
final_answer = tool_agent_app.invoke(initial_input)
conversation_trace = "
".join([
    f"{m.type}: {m.content or ''}{getattr(m, 'tool_calls', '')}" for m in final_answer["messages"]
])

def evaluate_tool_use(trace: str):
    prompt = f"You are an AI agent reviewer. Score the following conversation trace:
{trace}"
    return judge_llm.invoke(prompt)

evaluation = evaluate_tool_use(conversation_trace)
console.print(evaluation.model_dump_json(indent=2))

Sample scores (5,5,4) demonstrate that the agent correctly chose the web‑search tool, used an accurate query, and produced a high‑quality answer.

Conclusion

By combining a language model with external tools via LangGraph, we transform a static LLM into a dynamic, real‑world assistant capable of fetching up‑to‑date information. The key takeaways are the importance of clear tool descriptions, the value of tracing the agent’s reasoning, and the usefulness of automated evaluation for continuous improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMLangChainAgentTool UseLangGraph
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.