Tool Use and Function Calling

What

A mechanism that lets LLMs invoke external functions, APIs, and code during generation. The model decides when it needs a capability beyond text (math, data lookup, code execution), emits a structured tool call, receives the result, and incorporates it into its response. This transforms LLMs from text generators into general-purpose reasoning engines that can act on the world.

Why It Matters

Grounds LLMs in reality: models can look up current data instead of relying on stale training knowledge. A weather question gets a real API call, not a hallucinated answer
Extends capabilities: LLMs can’t do reliable arithmetic, but they can call a calculator. They can’t access databases, but they can call SQL queries
Enables agents: tool use is the foundation of AI Agents — multi-step workflows where the model plans, acts, observes, and iterates
Production integration: every major API provider (Anthropic, OpenAI, Google) now supports function calling natively, making it the standard way to integrate LLMs into applications

How It Works

The Tool Use Loop

1. User sends a message
2. Model receives message + tool definitions (JSON schemas)
3. Model generates a response that may include a tool_use block:
   {"name": "get_price", "input": {"ticker": "AAPL"}}
4. Client executes the tool, sends result back as tool_result
5. Model incorporates the result and continues generating
6. Repeat steps 3-5 as needed (multi-tool, multi-turn)

Defining Tools

Tools are defined as JSON schemas that tell the model what functions are available, what they do, and what parameters they accept:

tools = [
    {
        "name": "search_database",
        "description": "Search the product database by name or category. "
                       "Returns matching products with prices.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search term (product name or category)"
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum results to return (default 5)"
                }
            },
            "required": ["query"]
        }
    }
]

Good tool descriptions are critical. The model uses them to decide when and how to call a tool. Vague descriptions lead to wrong tool calls.

Model Context Protocol (MCP)

An open standard (Anthropic, Nov 2024, now under Linux Foundation) that standardizes how LLMs connect to external tools and data:

MCP Server exposes: Tools (callable functions), Resources (data/files), Prompts (templates)
Transports: Stdio (local processes), SSE, Streamable HTTPS (remote servers)
Adopted by: OpenAI, Google, Microsoft, Cursor, Sourcegraph, and many more
Key benefit: write a tool server once, use it with any MCP-compatible client

Client (Claude, ChatGPT, IDE) ←→ MCP Protocol ←→ Server (your tools)
                                    │
                          Tools: search, execute, query
                          Resources: files, databases
                          Prompts: templates

Structured Output

Forcing models to output valid JSON (not just hoping they do):

Approach	How it works	Used by
API-level enforcement	Schema provided in API call, output guaranteed valid	Anthropic, OpenAI, Google
Constrained decoding	Token masking at each step — only valid tokens allowed	Outlines, XGrammar, vLLM
Grammar-guided	Context-free grammar defines valid outputs	llama.cpp, GBNF

Constrained decoding works by building a state machine from the JSON schema and masking out invalid tokens at each generation step. Zero probability of malformed output.

Code Example

Tool Use with Anthropic API

import anthropic
import json
 
client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY env var
 
# Define tools
tools = [
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. "
                       "Use for any arithmetic the user asks about.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate, e.g. '2**10 + 3*7'"
                }
            },
            "required": ["expression"]
        }
    },
    {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
]
 
def execute_tool(name, input_data):
    """Execute a tool call and return the result."""
    if name == "calculate":
        # In production, use a sandboxed evaluator
        result = eval(input_data["expression"])
        return str(result)
    elif name == "get_weather":
        # Stub -- in production, call a weather API
        return json.dumps({"temp_c": 12, "condition": "partly cloudy"})
 
def chat_with_tools(user_message):
    messages = [{"role": "user", "content": user_message}]
 
    # Initial call -- model may request tool use
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=messages,
    )
 
    # Handle tool use loop
    while response.stop_reason == "tool_use":
        # Extract tool calls from response
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
 
        # Send tool results back
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
 
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )
 
    # Extract final text
    return "".join(b.text for b in response.content if b.type == "text")
 
# Example
answer = chat_with_tools("What's 2^20 + the temperature in Tallinn?")
print(answer)

Key Tradeoffs

Decision	Option A	Option B
Tool granularity	Many specific tools (precise)	Few general tools (flexible but ambiguous)
Execution	Client-side (simple, controlled)	Server-side MCP (reusable, standardized)
Validation	Trust model output (fast)	Schema validation before execution (safe)
Parallelism	Sequential tool calls (simple)	Parallel tool calls (faster, harder to debug)

Common Pitfalls

Vague tool descriptions: “Do stuff with data” — the model won’t know when to use it. Write descriptions as if explaining to a new engineer
Too many tools: 50+ tools confuse the model. Group related tools or use routing (pick relevant tools per query)
No error handling: tool calls fail. Return clear error messages so the model can retry or explain the failure
Executing arbitrary code: eval() on model-generated expressions is a security risk. Sandbox all code execution
Ignoring tool call cost: each tool call is an extra API round-trip. Design tools to return useful data in one call rather than requiring 5 sequential calls

Exercises

Build a simple tool-use loop: define a search_wikipedia tool (use the Wikipedia API), connect it to Claude or OpenAI, and ask questions that require lookup
Create an MCP server (using mcp Python SDK) that exposes a SQLite database as tools: query_table, list_tables, describe_table. Connect it to Claude Code
Implement constrained decoding: given a simple JSON schema {"name": str, "age": int}, write a token-masking function that only allows valid tokens at each position
Build a multi-tool agent: give the model a calculator, a web search tool, and a file-write tool. Ask it to research a topic, compute some statistics, and save a summary

Self-Test Questions

Explain the tool-use loop in 4 steps. Why does the model need to see the tool result before continuing?
What makes a good tool description? What happens with vague descriptions?
How does constrained decoding guarantee valid JSON output?
What is MCP and why is it better than each application defining its own tool format?
Why is eval() on model-generated code dangerous? What’s the safe alternative?

AI/ML Notes

Explorer

Tool Use and Function Calling

Tool Use and Function Calling

What

Why It Matters

How It Works

The Tool Use Loop

Defining Tools

Model Context Protocol (MCP)

Structured Output

Code Example

Tool Use with Anthropic API

Key Tradeoffs

Common Pitfalls

Exercises

Self-Test Questions

Links

Graph View

Table of Contents

Backlinks