Function calling is the bridge between language and action. You define a tool as a JSON schema (name, description, parameters). The LLM doesn't run the function — it decides *when* to call one and emits a structured JSON describing the call. Your code parses, validates, and executes; the result is injected back into the conversation as a tool message; the LLM responds with the grounded answer. With chaining, parallel calls, retries, and a strict permission model, this becomes the foundation of every AI assistant that actually does things.
Whenever your LLM needs to access fresh data (weather, prices, inventory, calendars), invoke an external API (send email, create ticket, schedule meeting), or perform precise computation (math, code execution, database queries). Don't use it when the answer is already implicit in the model's training data or in the conversation context — adding tools adds latency and points of failure.
The bridge from language to action
Pre-2023, an LLM was a fluent writer trapped in a box. Ask it for the current weather and it’d say “It’s around 22°C in Tokyo, but I can’t actually check live data.” Useful, but limited.
Function calling broke the box open. Now an LLM can decide:
Then your code runs the function, returns the result, and the LLM produces a grounded answer using the actual data.
The full request lifecycle
Every function-calling round trip follows the same shape:
sequenceDiagram
autonumber
participant U as User
participant App as Your code
participant LLM as LLM API
participant Tool as Tool / API
U->>App: "What's the weather in Tokyo?"
App->>LLM: messages + tool definitions
LLM-->>App: tool_call: get_weather(location="Tokyo")
Note over App: 1. Parse JSON<br/>2. Validate schema + business rules<br/>3. Check permissions
App->>Tool: GET /weather?city=Tokyo
Tool-->>App: { temp: 22, conditions: "cloudy" }
App->>LLM: messages + tool_result
LLM-->>App: "It's 22°C and cloudy in Tokyo."
App-->>U: "It's 22°C and cloudy in Tokyo."
Six messages. Two model calls. One real-world action. Done thousands of times a day in any production assistant.
1. The tool definition
A tool is a JSON schema. Three required pieces:
{
"name": "get_weather",
"description": "Get the current weather for a given location. Use when the user asks about temperature, conditions, or forecasts for a city or region.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or 'lat,lng', e.g. 'Tokyo' or '35.68,139.69'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"default": "celsius"
}
},
"required": ["location"]
}
}
2. The request
You include the tool definitions in your API call alongside the user message:
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in Tokyo?"},
],
tools=[get_weather_tool, get_calendar_tool, ...],
tool_choice="auto", # model decides
)
tool_choice options:
"auto"— model decides (default)"none"— never call tools"required"— must call a tool{"type": "function", "function": {"name": "X"}}— must call exactly tool X
3. The model’s response
For a tool-using turn, the response contains a tool_calls field instead of text content:
{
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Tokyo\", \"unit\": \"celsius\"}"
}
}]
}
4. Your code parses, validates, executes
This is the trust boundary. Three things must happen in this order:
# 1. Parse
args = json.loads(tool_call.function.arguments)
# 2. Validate
schema_validator.validate(args, get_weather_tool["parameters"])
assert args["location"] in allowed_locations # business rule
# 3. Execute
result = weather_api.fetch(args["location"], unit=args.get("unit", "celsius"))
5. The tool result goes back into the conversation
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
response = openai.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
)
Now the model has the actual data. It produces a natural-language answer:
“It’s currently 22°C and partly cloudy in Tokyo.”
The full conversation now has five messages (system, user, assistant-with-tool-call, tool-result, assistant-final). Save all of them for the next turn so the model has full context.
6. Tool chaining — multi-step reasoning
Real tasks often need multiple tools in sequence. “Find papers on retrieval-augmented generation, summarize the top 3, email me the summary.”
The flow:
- LLM emits
web_searchtool call - Your code searches, returns results
- LLM emits
summarizetool call (or just summarizes in-prompt) - Your code summarizes
- LLM emits
send_emailtool call - Your code sends, returns confirmation
- LLM produces final response: “Done. Email sent.”
7. Parallel tool calls
When tools are independent, the model can request them in one turn:
"tool_calls": [
{"function": {"name": "get_weather", "arguments": "..."}},
{"function": {"name": "get_calendar", "arguments": "..."}},
{"function": {"name": "get_traffic", "arguments": "..."}}
]
results = await asyncio.gather(*[
execute_tool(call) for call in response.tool_calls
])
for call, result in zip(response.tool_calls, results):
messages.append({"role": "tool", "tool_call_id": call.id, "content": result})
8. Failure handling
Tools fail. Network blips, rate limits, bad arguments, downstream outages. The system needs:
- Retries with exponential backoff (
1s, 2s, 4s, give up) - Fallbacks — if
get_weather_v2fails, tryget_weather_v1 - Error messages back to the model — don’t just throw; tell the LLM what went wrong so it can recover
A failed tool result might be:
{
"role": "tool",
"tool_call_id": "call_abc",
"content": "{\"error\": \"location not found\", \"suggestion\": \"try a more specific name\"}"
}
9. Security — the layered defense
Function calling is the most dangerous LLM feature, because it’s the one that takes real-world action.
flowchart TD
LLM[LLM emits tool_call] --> L1[Layer 1: Permission scoping<br/><i>Tool not in user's allowlist?<br/>Model never sees it.</i>]
L1 -->|allowed| L2[Layer 2: Argument validation<br/><i>JSON schema + business rules<br/>SQL/XSS/path-traversal patterns blocked</i>]
L2 -->|valid| L3[Layer 3: Sandboxing<br/><i>Code execution in isolated container<br/>no net, no FS, no privileges</i>]
L3 --> L4[Layer 4: Audit log<br/><i>Every call recorded:<br/>who, when, args, result</i>]
L4 --> L5{Layer 5:<br/>High-risk tool?}
L5 -->|yes| HA[Human approval<br/><i>send money, delete data,<br/>external email</i>]
L5 -->|no| EX[Execute]
HA -->|approved| EX
HA -->|denied| BLK[Block + log]
EX --> R[Result]
L1 -.blocked.-> BLK
L2 -.invalid.-> BLK
style LLM fill:#7e1d1d,stroke:#ef4444,color:#fff
style L1 fill:#1e3a8a,stroke:#3b82f6,color:#fff
style L2 fill:#0e7490,stroke:#06b6d4,color:#fff
style L3 fill:#581c87,stroke:#a855f7,color:#fff
style L4 fill:#365314,stroke:#84cc16,color:#fff
style L5 fill:#9a3412,stroke:#f97316,color:#fff
style HA fill:#9a3412,stroke:#f97316,color:#fff
style EX fill:#365314,stroke:#84cc16,color:#fff
style R fill:#1c2333,stroke:#475569,color:#e7eaf1
style BLK fill:#7f1d1d,stroke:#f43f5e,color:#fff
10. The shape of every modern AI assistant
Function calling is the architectural primitive behind every assistant that actually does things — Siri, Alexa, ChatGPT plugins, Claude with computer use, Cursor, GitHub Copilot Workspace, every customer support bot built since 2023.
The pattern is always:
- Define the actions as tools
- Let the model decide
- Validate and execute
- Feed results back
- Loop
Comments 0
Discuss this page. Markdown supported. Be kind.