Unlocking LLM Power: A Hands‑On Guide to Function Calling with Mistral, Llama, and Qwen
This tutorial explains how large language models can use function calling to access real‑time data, walks through setting up a Flask endpoint, demonstrates integration with Mistral Small, Llama 3.2‑1B, and Qwen models, and provides complete Python code examples for end‑to‑end execution.
LLM Fundamentals
Function calling is a mechanism that lets large language models (LLMs) invoke external functions or APIs to perform predefined tasks, effectively delegating work the model cannot handle on its own.
For example, asking an LLM "What is Tesla's current stock price?" without function calling may produce a fabricated answer or admit lack of real‑time data.
Using Qwen2.5‑0.5B Instruct demonstrates this limitation:
<code>from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "What's the current stock price of Tesla?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=True)
print(tokenizer.decode(outputs[0]))
# Output:
# What's the current stock price of Tesla?
# I'm sorry, but as an AI language model, I don't have access to real-time financial data. The stock price of Tesla can change rapidly.
</code>With function calling, the model can recognize the need for real‑time stock data, trigger a financial service API, and return an answer such as "Tesla's current stock price is $279.24."
OpenAI's GPT‑4 and many Mistral models (e.g., Mistral Small, Mistral Nemo) natively support function calling via their APIs.
"Mistral models are underrated; a few tries reveal how excellent they are."
First, install the Mistral AI Python client:
<code>pip3 install mistralai
pip3 list | grep mistralai
# Output
# mistralai 1.6.0
</code>Next, create a Flask endpoint that lists files in a given directory:
<code>from flask import Flask, request
from os import path, listdir
app = Flask(__name__)
@app.route('/files')
def list_files():
directory = request.args.get('directory', '.')
files = [{
'name': f
} for f in listdir(directory) if path.isfile(path.join(directory, f))]
return files
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000)
</code>Test the endpoint with curl:
<code>curl "http://localhost:8000/files?directory=/tmp"
# Output
# [{"name":"api.py"}]
</code>Define the function schema for the LLM (using Mistral Small as an example):
<code>available_functions = [
{
"type": "function",
"function": {
"name": "files",
"description": "List files in a directory",
"parameters": {
"type": "object",
"properties": {
"directory": {
"type": "string",
"description": "The absolute path of a directory"
}
},
"required": ["directory"]
}
}
}
]
</code>Pass available_functions together with the user prompt to the Mistral client:
<code>from mistralai import Mistral
api_key = "..."
model = "mistral-small-2503"
mistral = Mistral(api_key)
response = mistral.chat.complete(
model=model,
messages=[{"role": "user", "content": "Give me all the files in the /tmp directory."}],
tools=available_functions,
tool_choice="any"
)
print(response.choices[0].message.tool_calls[0])
# Output shows a function call to "files" with argument {"directory": "/tmp"}
</code>After receiving a non‑empty tool_calls array, manually invoke the function using requests :
<code>import json, requests
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
if tool_call.function.name == "files":
output = requests.get(
"http://localhost:8000/files?directory=" + json.loads(tool_call.function.arguments)["directory"]
).json()
print(output)
# Output
# [{"name": "qq1.html"}]
</code>Note that arguments arrive as JSON strings and must be parsed with json.loads() . The LLM itself does not produce textual output; the function handles printing.
For local open‑source models, the same function schema can be supplied. Example with Llama 3.2‑1B Instruct:
<code>from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "meta-llama/Llama-3.2-1B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the files in /tmp"}
]
template = tokenizer.apply_chat_template(messages, tools=available_functions, tokenize=False)
inputs = tokenizer(template, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=True)
print(tokenizer.decode(outputs[0]))
# Model outputs a JSON‑like function call wrapped in <|python_tag|> ... <|eom_id|>
</code>Extract the function call using a regular expression:
<code>import json, re
generated_text = tokenizer.decode(outputs[0])
matched = re.search(r"<\|python_tag\|>(.*?)<\|eom_id\|>", generated_text, re.DOTALL)
function_call = json.loads(matched.group(1).strip())
print(function_call)
# Output: {'type': 'function', 'function': 'files', 'parameters': {'directory': '/tmp'}}
</code>The same approach works with Qwen, which wraps calls in <tool_call> tags.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.