As of April 2026, Gemma 4 tool calling is broken in Ollama v0.20.0 (ollama/ollama#15241) - the tool call parser fails and streaming drops tool calls entirely. OpenCode also has issues with local OpenAI-compatible providers (anomalyco/opencode#20669, #20719).
This guide documents a working setup using:
- llama.cpp (built from source with PR #21326 template fix + PR #21343 tokenizer fix) instead of Ollama
- OpenCode built from source with PR #16531 tool-call compatibility layer
Tested on macOS Apple Silicon (M1 Max, 32GB) on April 2, 2026.
Ollama has three problems with Gemma 4 right now:
- Tool call parser crashes -
gemma4 tool call parsing failed: invalid character(#15241) - Streaming drops tool calls - tool call data goes into the
reasoningfield with emptycontent, so the actual tool call never reaches the client <unused25>token spam - tokenizer bug causes garbage output
llama.cpp has fixes for all three (PRs #21326 and #21343).
Stock OpenCode can't recover when a local model:
- Returns tool calls as plain JSON text instead of proper function calls
- Sends
finish_reason: "stop"instead of"tool_calls" - Uses legacy
function_callformat instead of moderntool_calls
PR #16531 adds a toolParser compatibility layer that handles all of these.
- macOS Apple Silicon with 24GB+ RAM (32GB recommended for 26B model)
- Homebrew
- Close heavy apps (Chrome tabs, etc.) - the 26B model uses ~17GB
brew install git gh cmake bunThe Homebrew version doesn't have the Gemma 4 fixes yet. Build from source with both PRs:
# Clone latest llama.cpp (includes merged PR #21326)
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build
# Cherry-pick the tokenizer fix (PR #21343, not yet merged)
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit
# Build with Metal (Apple GPU) support
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server
# Verify
./build/bin/llama-server --versionPick a context size based on your available RAM. Gemma 4 26B supports up to 128k context, but larger context uses more memory:
Context (-c) |
Total RAM needed | Best for |
|---|---|---|
| 32768 (32k) | ~17GB | Basic tasks, short sessions |
| 65536 (64k) | ~20GB | Longer sessions, bigger files |
| 131072 (128k) | ~25GB+ | Maximum context, may cause memory pressure on 32GB machines |
/tmp/llama-cpp-build/build/bin/llama-server \
-hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
--port 8089 -ngl 99 -c 32768 --jinja/tmp/llama-cpp-build/build/bin/llama-server \
-hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
--port 8089 -ngl 99 -c 65536 --jinja/tmp/llama-cpp-build/build/bin/llama-server \
-hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
--port 8089 -ngl 99 -c 131072 --jinjaIf -hf doesn't work with the source build, download manually:
# Install huggingface CLI if needed
pip install huggingface-cli
# Download the model
huggingface-cli download ggml-org/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-Q4_K_M.gguf
# Start with local file (use any -c value from above)
/tmp/llama-cpp-build/build/bin/llama-server \
-m ~/.cache/huggingface/hub/models--ggml-org--gemma-4-26B-A4B-it-GGUF/snapshots/*/gemma-4-26B-A4B-it-Q4_K_M.gguf \
--port 8089 -ngl 99 -c 32768 --jinjaWait for "listening on http://127.0.0.1:8089" then verify:
curl http://127.0.0.1:8089/health
# Should return: {"status":"ok"}| Flag | Purpose |
|---|---|
-hf |
Download model from HuggingFace |
--port 8089 |
API port (avoid conflicts with Ollama's 11434) |
-ngl 99 |
Offload all layers to GPU (Metal) |
-c N |
Context window size (32k minimum for OpenCode's tool definitions) |
--jinja |
Enable Jinja2 chat templates (required for Gemma 4 tool calling) |
Use the E4B model instead (~9.6GB):
/tmp/llama-cpp-build/build/bin/llama-server \
-hf ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M \
--port 8089 -ngl 99 -c 32768 --jinjaNote: E4B is much weaker at tool calling than 26B.
curl -fsSL https://opencode.ai/install -o /tmp/opencode-install.sh
bash /tmp/opencode-install.sh
source ~/.zshrc
# IMPORTANT: Remove old Homebrew version if it exists (v0.0.55 shadows the new one)
brew uninstall opencode 2>/dev/null
# Verify correct version
which opencode # Should be ~/.opencode/bin/opencode
opencode --versiongit clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build
# Checkout the tool-call compat PR
gh pr checkout 16531
# Install deps and build
bun install
cd packages/opencode
bun run build -- --single --skip-install
# Back up original and install
cp ~/.opencode/bin/opencode ~/.opencode/bin/opencode.backup
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode
chmod +x ~/.opencode/bin/opencode
# Verify
opencode --version
# Should show: 0.0.0-feat/custom-provider-compat-...Create opencode.json in your project directory (or globally at ~/.config/opencode/opencode.json):
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama.cpp (local)",
"options": {
"baseURL": "http://127.0.0.1:8089/v1",
"toolParser": [
{ "type": "raw-function-call" },
{ "type": "json" }
]
},
"models": {
"gemma4-26b": {
"name": "Gemma 4 26B",
"tool_call": true,
"limit": {
"context": 32768,
"output": 8192
}
}
}
}
},
"model": "llama/gemma4-26b",
"agent": {
"build": {
"prompt": "{file:./.opencode/prompts/build.txt}",
"permission": {
"edit": "allow",
"bash": "allow",
"webfetch": "allow"
}
}
}
}| Field | Purpose |
|---|---|
baseURL |
Points to llama-server on port 8089 |
toolParser: raw-function-call |
Rewrites tools to legacy functions/function_call format |
toolParser: json |
Recovers tool calls from JSON in text responses |
tool_call: true |
Tells OpenCode this model supports tool calling |
Gemma 4 needs explicit instructions about exact tool parameter names. Create AGENTS.md in your project root:
The full contents of AGENTS.md and .opencode/prompts/build.txt are provided below. These teach Gemma 4 the exact parameter names for each tool, which is critical for reducing tool call errors.
AGENTS.md (click to expand)
# OpenCode Agent Instructions
You are a coding assistant with FULL access to the user's file system and terminal through tools.
CRITICAL: You MUST use tools to complete tasks. NEVER say "I don't have access". NEVER suggest the user run commands. NEVER output code snippets instead of using tools. Always take action immediately.
## Tool Schemas (EXACT parameter names - you MUST use these exactly)
### bash
Execute shell commands.
Parameters (ALL required unless noted):
- `command` (string, REQUIRED): The shell command to run
- `description` (string, REQUIRED): Short description of what the command does (5-10 words)
- `timeout` (number, optional): Timeout in milliseconds
- `workdir` (string, optional): Working directory
Example: `{"command": "ls -la", "description": "List files in current directory"}`
### write
Create or overwrite a file.
Parameters (ALL required):
- `filePath` (string, REQUIRED): Absolute path to the file
- `content` (string, REQUIRED): The content to write
Example: `{"filePath": "/Users/web/project/hello.txt", "content": "Hello world"}`
### read
Read a file.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `offset` (number, optional): Line number to start from
- `limit` (number, optional): Max lines to read
Example: `{"filePath": "/Users/web/project/hello.txt"}`
### edit
Modify an existing file by replacing text.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `oldString` (string, REQUIRED): The exact text to find and replace
- `newString` (string, REQUIRED): The replacement text
- `replaceAll` (boolean, optional): Replace all occurrences
Example: `{"filePath": "/path/to/file.ts", "oldString": "foo", "newString": "bar"}`
### glob
Find files by pattern.
Parameters:
- `pattern` (string, REQUIRED): Glob pattern like `**/*.ts`
- `path` (string, optional): Directory to search in
### grep
Search file contents.
Parameters:
- `pattern` (string, REQUIRED): Regex pattern to search for
- `path` (string, optional): Directory to search in
- `include` (string, optional): File pattern filter like `*.js`
### todowrite
Track tasks and progress. The `todos` parameter MUST be a JSON array of objects, NOT a string.
Parameters:
- `todos` (array of objects, REQUIRED): Each object has:
- `content` (string, REQUIRED): Brief description of the task
- `status` (string, REQUIRED): One of: `pending`, `in_progress`, `completed`, `cancelled`
- `priority` (string, REQUIRED): One of: `high`, `medium`, `low`
Example: `{"todos": [{"content": "Add game over screen", "status": "in_progress", "priority": "high"}, {"content": "Add sound effects", "status": "pending", "priority": "low"}]}`
IMPORTANT: `todos` MUST be an array `[...]`, NOT a string `"[...]"`. Never stringify the array.
## IMPORTANT REMINDERS
- The `bash` tool REQUIRES both `command` AND `description` fields. Always include both.
- The `write` tool parameter is `filePath` (camelCase), NOT `file_path`.
- The `edit` tool uses `oldString`/`newString` (camelCase), NOT `old_string`/`new_string`.
- Do NOT call tools that don't exist. Available tools: bash, read, write, edit, glob, grep, task, webfetch, todowrite, question, skill.
- There is NO `list` tool. To list files use `bash` with `ls`..opencode/prompts/build.txt (click to expand)
mkdir -p .opencode/promptsYou are a coding assistant with FULL access to the user's file system and terminal through tools.
CRITICAL RULES:
1. You MUST use tools to complete tasks. NEVER say "I don't have access".
2. NEVER suggest the user run commands - YOU run them using your tools.
3. NEVER output code snippets as your answer - USE the tools to create/edit files.
4. Call the appropriate tool IMMEDIATELY when action is needed.
TOOL PARAMETER REFERENCE (use these exact names):
bash: {"command": "ls -la", "description": "List files in directory"}
- command (REQUIRED string): the shell command
- description (REQUIRED string): 5-10 word description of what the command does
write: {"filePath": "/absolute/path/file.txt", "content": "file content here"}
- filePath (REQUIRED string, camelCase): absolute path to the file
- content (REQUIRED string): the content to write
read: {"filePath": "/absolute/path/file.txt"}
- filePath (REQUIRED string, camelCase): absolute path
edit: {"filePath": "/path/file.txt", "oldString": "old text", "newString": "new text"}
- filePath (REQUIRED string, camelCase)
- oldString (REQUIRED string, camelCase): exact text to find
- newString (REQUIRED string, camelCase): replacement text
glob: {"pattern": "**/*.ts"}
- pattern (REQUIRED string): glob pattern
grep: {"pattern": "searchRegex"}
- pattern (REQUIRED string): regex pattern
todowrite: {"todos": [{"content": "task description", "status": "pending", "priority": "high"}]}
- todos (REQUIRED array of objects, NOT a string): each object has content, status, priority
- status: one of "pending", "in_progress", "completed", "cancelled"
- priority: one of "high", "medium", "low"
- CRITICAL: todos MUST be an array [...], NEVER a string "[...]"
IMPORTANT:
- bash REQUIRES both "command" AND "description" parameters. Always include both.
- Use camelCase for all parameter names: filePath, oldString, newString, replaceAll
- Do NOT call tools that don't exist. There is NO "list" tool. Use bash with ls instead.
- Always take action. Never just describe what could be done.
# Make sure llama-server is running on port 8089 (Step 2)
# Launch OpenCode TUI
opencode
# Or run a one-off command
opencode run "create a hello.txt file with a smiley face"# 1. Build llama.cpp with Gemma 4 fixes
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server
# 2. Start the model server
./build/bin/llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M --port 8089 -ngl 99 -c 32768 --jinja
# 3. Install and build OpenCode with tool-call compat
curl -fsSL https://opencode.ai/install -o /tmp/oc.sh && bash /tmp/oc.sh
brew uninstall opencode 2>/dev/null
git clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build && gh pr checkout 16531
bun install && cd packages/opencode && bun run build -- --single --skip-install
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode
# 4. Configure (create opencode.json in your project - see Step 5 above)
# 5. Create AGENTS.md in your project (see Step 6 above)
# 6. Run it
opencodeThe 26B model uses ~17GB. Close Chrome and other heavy apps. If it's too much, use E4B instead (see Step 2).
Increase -c when starting llama-server. 32768 is the minimum for OpenCode's tool definitions.
The model may take a few retries to get parameter names right. The toolParser compat layer and AGENTS.md with exact schemas help reduce this.
cp ~/.opencode/bin/opencode.backup ~/.opencode/bin/opencode- ollama/ollama#15241 - Gemma4 tool call parsing fails
- ggml-org/llama.cpp#21326 - Gemma 4 template fix (merged)
- ggml-org/llama.cpp#21343 - Gemma 4 tokenizer fix
- anomalyco/opencode#20669 - Local provider tool-call quirks
- anomalyco/opencode#16531 - Tool-call compat layer PR
Gemma4 works on llama.cpp out of the box now, if you install from the latest tagged commit (
brew install llama.cpp --HEAD)