Tool Calling#
Tool calling (also known as function calling) enables models to interact with external tools and APIs. Furiosa-LLM supports tool calling for models trained with this capability.
Tool Calling Parsers#
The system converts model outputs into the OpenAI response format through a designated parser implementation. Tool calling parsers are model-dependent, as different models use different formats for tool calls.
Currently, Furiosa-LLM supports the following tool calling parsers:
hermes: For models using the Hermes tool calling format (e.g., EXAONE-4.0, Qwen3 series)llama: For Llama series models (e.g., Llama 3.1, Llama 3.2)openai: For models using the OpenAI tool calling format (e.g., gpt-oss-20b, gpt-oss-120b)
When starting the server, specify the appropriate parser using the --tool-call-parser option.
Tool Choice Options#
The tool_choice parameter controls how the model selects tools to call. Furiosa-LLM supports the following options:
auto(default): The model decides whether to call a tool or respond directly based on the conversation context.required: Forces the model to call at least one tool. The model cannot respond without making a tool call.{"type": "function", "function": {"name": "<function_name>"}}: Forces the model to call a specific named function.
For more details on the tool calling specification, refer to the OpenAI Chat API documentation.
Offline Example#
import json
import random
import string
from furiosa_llm import LLM, SamplingParams
llm = LLM("furiosa-ai/Llama-3.1-8B-Instruct")
sampling_params = SamplingParams(max_tokens=512, temperature=1.0)
def generate_random_id(length=9):
characters = string.ascii_letters + string.digits
random_id = "".join(random.choice(characters) for _ in range(length))
return random_id
# simulate an API that can be called
def get_current_weather(city: str, state: str, unit: "str"):
return (
f"The weather in {city}, {state} is 85 degrees {unit}. It is "
"partly cloudly, with highs in the 90's."
)
tool_functions = {"get_current_weather": get_current_weather}
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'",
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is"
" in, e.g. 'CA' which would mean 'California'",
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["city", "state", "unit"],
},
},
}
]
messages = [
{
"role": "system",
"content": "When you receive a tool call response, use the output to format an answer to the original user question.\n\nYou are a helpful assistant with tool calling capabilities.",
},
{
"role": "user",
"content": "Can you tell me what the temperature will be in Dallas, in fahrenheit?",
},
]
outputs = llm.chat(messages, sampling_params=sampling_params, tools=tools)
output = outputs[0].outputs[0].text.strip()
# append the assistant message
messages.append(
{
"role": "assistant",
"content": output,
}
)
# let's now actually parse and execute the model's output simulating an API call by using the
# above defined function
tool_call = json.loads(output)
tool_answer = tool_functions[tool_call["name"]](**tool_call["parameters"])
# append the answer as a tool message and let the LLM give you an answer
messages.append(
{
"role": "tool",
"content": {"output": tool_answer},
"tool_call_id": generate_random_id(),
}
)
outputs = llm.chat(messages, sampling_params, tools=tools)
print(outputs[0].outputs[0].text.strip())
# yields
# 'The current temperature in Dallas, TX is 85 degrees Fahrenheit. '
# 'It is partly cloudy with highs in the 90's.
Online Example with Named Function Calling#
import json
import os
from openai import OpenAI
base_url = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")
api_key = os.getenv("OPENAI_API_KEY", "EMPTY")
client = OpenAI(base_url=base_url, api_key=api_key)
def get_weather(location: str, unit: str):
return f"Getting the weather for {location} in {unit}..."
def get_time(timezone: str):
return f"Getting the current time in {timezone}..."
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "unit"]
}
}
},
{
"type": "function",
"function": {
"name": "get_time",
"description": "Get the current time in a given timezone",
"parameters": {
"type": "object",
"properties": {
"timezone": {"type": "string", "description": "Timezone, e.g., 'America/Los_Angeles'"}
},
"required": ["timezone"]
}
}
}
]
response = client.chat.completions.create(
model=client.models.list().data[0].id,
messages=[{"role": "user", "content": "What's the weather and time in San Francisco?"}],
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_weather"}} # Force specific function
)
tool_call = response.choices[0].message.tool_calls[0].function
print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}")
assert tool_call.name == "get_weather", f"Expected 'get_weather' but got '{tool_call.name}'"
print(f"Result: {get_weather(**json.loads(tool_call.arguments))}")