Gemma 4 Function Calling Guide

Use Gemma 4's built-in tool calling (function calling) for structured JSON output and external API integration.

What is Function Calling?

Function calling (also called tool use) allows a language model to request the execution of external functions and incorporate the results into its response. Instead of making up an answer, the model can signal "I need to call get_weather('Tokyo') to answer this question" — your application runs the function, returns the result to the model, and the model then composes a final answer using the real data.

Gemma 4 has native function calling support built into all instruction-tuned variants. The model was trained to output structured tool_call JSON when it determines a function call is appropriate.

Supported Variants

Function calling works with all Gemma 4 instruction-tuned models:

Gemma 4 E4B (google/gemma-4-4b-it)
Gemma 4 26B A4B (google/gemma-4-26b-a4b-it)
Gemma 4 31B (google/gemma-4-31b-it)

It also works in Thinking Mode variants, where the model reasons about which tool to call before making the call.

Define Tools

Tools are defined as JSON Schema objects in the OpenAI tool calling format, which Gemma 4's tokenizer understands natively:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Tokyo"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

Single Tool Call Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

model_id = "google/gemma-4-4b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
response_text = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)
print(response_text)

When the model decides to call a tool, the output will contain a JSON object rather than a natural language response:

{"name": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}}

Parsing Tool Calls

def parse_tool_call(response_text: str) -> dict | None:
    """Extract tool call JSON from model output, if present."""
    try:
        parsed = json.loads(response_text.strip())
        if "name" in parsed and "arguments" in parsed:
            return parsed
    except json.JSONDecodeError:
        pass
    return None

tool_call = parse_tool_call(response_text)
if tool_call:
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['arguments']}")

Multi-Turn Tool Call Example

After parsing the tool call, execute the function, append the result to the message history, and call the model again to get the final answer:

# Step 1: Model requests a tool call (from above)
tool_call = parse_tool_call(response_text)

# Step 2: Execute the real function
def get_weather(location: str, unit: str = "celsius") -> dict:
    # Replace with a real weather API call
    return {"temperature": 22, "condition": "Partly cloudy", "unit": unit}

result = get_weather(**tool_call["arguments"])

# Step 3: Append tool result and get final answer
messages.append({"role": "assistant", "content": response_text})
messages.append({
    "role": "tool",
    "name": tool_call["name"],
    "content": json.dumps(result),
})

inputs2 = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
).to(model.device)

final_outputs = model.generate(**inputs2, max_new_tokens=512)
final_response = tokenizer.decode(
    final_outputs[0][inputs2["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)
print(final_response)
# "The current weather in Tokyo is 22°C and partly cloudy."

Multiple Tools

You can define any number of tools. The model selects the appropriate one based on the user's request:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": { ... }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
]

JSON Mode (No Tools Required)

For simpler cases where you just want structured JSON output without defining formal tool schemas, instruct the model directly in the system prompt:

Respond only with valid JSON. Do not include any explanation or markdown.
Format: {"answer": string, "confidence": number, "sources": [string]}

This works reliably for extraction, classification, and structured data generation tasks. However, it does not give the model the ability to request real-time data — for that you need proper tool calling.

Parallel Tool Calls

Gemma 4 supports requesting multiple tool calls in a single turn. The output in that case is a JSON array:

[
  {"name": "get_weather", "arguments": {"location": "Tokyo"}},
  {"name": "get_weather", "arguments": {"location": "London"}}
]

Update your parsing logic to handle both a single object and an array response.