Skip to content

Instantly share code, notes, and snippets.

@daniel-farina
Last active April 11, 2026 15:55
Show Gist options
  • Select an option

  • Save daniel-farina/87dc1c394b94e45bb700d27e9ea03193 to your computer and use it in GitHub Desktop.

Select an option

Save daniel-farina/87dc1c394b94e45bb700d27e9ea03193 to your computer and use it in GitHub Desktop.
Running OpenCode with Gemma 4 26B on macOS via llama.cpp - fixing tool calling

Running OpenCode with Gemma 4 26B on macOS (via llama.cpp)

As of April 2026, Gemma 4 tool calling is broken in Ollama v0.20.0 (ollama/ollama#15241) - the tool call parser fails and streaming drops tool calls entirely. OpenCode also has issues with local OpenAI-compatible providers (anomalyco/opencode#20669, #20719).

This guide documents a working setup using:

  • llama.cpp (built from source with PR #21326 template fix + PR #21343 tokenizer fix) instead of Ollama
  • OpenCode built from source with PR #16531 tool-call compatibility layer

Tested on macOS Apple Silicon (M1 Max, 32GB) on April 2, 2026.


Why not Ollama?

Ollama has three problems with Gemma 4 right now:

  1. Tool call parser crashes - gemma4 tool call parsing failed: invalid character (#15241)
  2. Streaming drops tool calls - tool call data goes into the reasoning field with empty content, so the actual tool call never reaches the client
  3. <unused25> token spam - tokenizer bug causes garbage output

llama.cpp has fixes for all three (PRs #21326 and #21343).

Why not stock OpenCode?

Stock OpenCode can't recover when a local model:

  • Returns tool calls as plain JSON text instead of proper function calls
  • Sends finish_reason: "stop" instead of "tool_calls"
  • Uses legacy function_call format instead of modern tool_calls

PR #16531 adds a toolParser compatibility layer that handles all of these.


Prerequisites

  • macOS Apple Silicon with 24GB+ RAM (32GB recommended for 26B model)
  • Homebrew
  • Close heavy apps (Chrome tabs, etc.) - the 26B model uses ~17GB
brew install git gh cmake bun

Step 1: Build llama.cpp from source

The Homebrew version doesn't have the Gemma 4 fixes yet. Build from source with both PRs:

# Clone latest llama.cpp (includes merged PR #21326)
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build

# Cherry-pick the tokenizer fix (PR #21343, not yet merged)
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit

# Build with Metal (Apple GPU) support
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server

# Verify
./build/bin/llama-server --version

Step 2: Download the model and start llama-server

Pick a context size based on your available RAM. Gemma 4 26B supports up to 128k context, but larger context uses more memory:

Context (-c) Total RAM needed Best for
32768 (32k) ~17GB Basic tasks, short sessions
65536 (64k) ~20GB Longer sessions, bigger files
131072 (128k) ~25GB+ Maximum context, may cause memory pressure on 32GB machines

32k context (recommended for 24-32GB Macs)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 32768 --jinja

64k context (recommended for 32GB+ Macs)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 65536 --jinja

128k context (maximum, needs 32GB+ with apps closed)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 131072 --jinja

If -hf doesn't work with the source build, download manually:

# Install huggingface CLI if needed
pip install huggingface-cli

# Download the model
huggingface-cli download ggml-org/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-Q4_K_M.gguf

# Start with local file (use any -c value from above)
/tmp/llama-cpp-build/build/bin/llama-server \
  -m ~/.cache/huggingface/hub/models--ggml-org--gemma-4-26B-A4B-it-GGUF/snapshots/*/gemma-4-26B-A4B-it-Q4_K_M.gguf \
  --port 8089 -ngl 99 -c 32768 --jinja

Wait for "listening on http://127.0.0.1:8089" then verify:

curl http://127.0.0.1:8089/health
# Should return: {"status":"ok"}

Flags explained

Flag Purpose
-hf Download model from HuggingFace
--port 8089 API port (avoid conflicts with Ollama's 11434)
-ngl 99 Offload all layers to GPU (Metal)
-c N Context window size (32k minimum for OpenCode's tool definitions)
--jinja Enable Jinja2 chat templates (required for Gemma 4 tool calling)

For smaller Macs (16GB RAM)

Use the E4B model instead (~9.6GB):

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 32768 --jinja

Note: E4B is much weaker at tool calling than 26B.

Step 3: Install OpenCode

curl -fsSL https://opencode.ai/install -o /tmp/opencode-install.sh
bash /tmp/opencode-install.sh
source ~/.zshrc

# IMPORTANT: Remove old Homebrew version if it exists (v0.0.55 shadows the new one)
brew uninstall opencode 2>/dev/null

# Verify correct version
which opencode    # Should be ~/.opencode/bin/opencode
opencode --version

Step 4: Build OpenCode from source with PR #16531

git clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build

# Checkout the tool-call compat PR
gh pr checkout 16531

# Install deps and build
bun install
cd packages/opencode
bun run build -- --single --skip-install

# Back up original and install
cp ~/.opencode/bin/opencode ~/.opencode/bin/opencode.backup
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode
chmod +x ~/.opencode/bin/opencode

# Verify
opencode --version
# Should show: 0.0.0-feat/custom-provider-compat-...

Step 5: Configure OpenCode

Create opencode.json in your project directory (or globally at ~/.config/opencode/opencode.json):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama.cpp (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8089/v1",
        "toolParser": [
          { "type": "raw-function-call" },
          { "type": "json" }
        ]
      },
      "models": {
        "gemma4-26b": {
          "name": "Gemma 4 26B",
          "tool_call": true,
          "limit": {
            "context": 32768,
            "output": 8192
          }
        }
      }
    }
  },
  "model": "llama/gemma4-26b",
  "agent": {
    "build": {
      "prompt": "{file:./.opencode/prompts/build.txt}",
      "permission": {
        "edit": "allow",
        "bash": "allow",
        "webfetch": "allow"
      }
    }
  }
}

Key config fields

Field Purpose
baseURL Points to llama-server on port 8089
toolParser: raw-function-call Rewrites tools to legacy functions/function_call format
toolParser: json Recovers tool calls from JSON in text responses
tool_call: true Tells OpenCode this model supports tool calling

Step 6: Create AGENTS.md

Gemma 4 needs explicit instructions about exact tool parameter names. Create AGENTS.md in your project root:

The full contents of AGENTS.md and .opencode/prompts/build.txt are provided below. These teach Gemma 4 the exact parameter names for each tool, which is critical for reducing tool call errors.

AGENTS.md (click to expand)
# OpenCode Agent Instructions

You are a coding assistant with FULL access to the user's file system and terminal through tools.

CRITICAL: You MUST use tools to complete tasks. NEVER say "I don't have access". NEVER suggest the user run commands. NEVER output code snippets instead of using tools. Always take action immediately.

## Tool Schemas (EXACT parameter names - you MUST use these exactly)

### bash
Execute shell commands.
Parameters (ALL required unless noted):
- `command` (string, REQUIRED): The shell command to run
- `description` (string, REQUIRED): Short description of what the command does (5-10 words)
- `timeout` (number, optional): Timeout in milliseconds
- `workdir` (string, optional): Working directory

Example: `{"command": "ls -la", "description": "List files in current directory"}`

### write
Create or overwrite a file.
Parameters (ALL required):
- `filePath` (string, REQUIRED): Absolute path to the file
- `content` (string, REQUIRED): The content to write

Example: `{"filePath": "/Users/web/project/hello.txt", "content": "Hello world"}`

### read
Read a file.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `offset` (number, optional): Line number to start from
- `limit` (number, optional): Max lines to read

Example: `{"filePath": "/Users/web/project/hello.txt"}`

### edit
Modify an existing file by replacing text.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `oldString` (string, REQUIRED): The exact text to find and replace
- `newString` (string, REQUIRED): The replacement text
- `replaceAll` (boolean, optional): Replace all occurrences

Example: `{"filePath": "/path/to/file.ts", "oldString": "foo", "newString": "bar"}`

### glob
Find files by pattern.
Parameters:
- `pattern` (string, REQUIRED): Glob pattern like `**/*.ts`
- `path` (string, optional): Directory to search in

### grep
Search file contents.
Parameters:
- `pattern` (string, REQUIRED): Regex pattern to search for
- `path` (string, optional): Directory to search in
- `include` (string, optional): File pattern filter like `*.js`

### todowrite
Track tasks and progress. The `todos` parameter MUST be a JSON array of objects, NOT a string.
Parameters:
- `todos` (array of objects, REQUIRED): Each object has:
  - `content` (string, REQUIRED): Brief description of the task
  - `status` (string, REQUIRED): One of: `pending`, `in_progress`, `completed`, `cancelled`
  - `priority` (string, REQUIRED): One of: `high`, `medium`, `low`

Example: `{"todos": [{"content": "Add game over screen", "status": "in_progress", "priority": "high"}, {"content": "Add sound effects", "status": "pending", "priority": "low"}]}`

IMPORTANT: `todos` MUST be an array `[...]`, NOT a string `"[...]"`. Never stringify the array.

## IMPORTANT REMINDERS

- The `bash` tool REQUIRES both `command` AND `description` fields. Always include both.
- The `write` tool parameter is `filePath` (camelCase), NOT `file_path`.
- The `edit` tool uses `oldString`/`newString` (camelCase), NOT `old_string`/`new_string`.
- Do NOT call tools that don't exist. Available tools: bash, read, write, edit, glob, grep, task, webfetch, todowrite, question, skill.
- There is NO `list` tool. To list files use `bash` with `ls`.
.opencode/prompts/build.txt (click to expand)
mkdir -p .opencode/prompts
You are a coding assistant with FULL access to the user's file system and terminal through tools.

CRITICAL RULES:
1. You MUST use tools to complete tasks. NEVER say "I don't have access".
2. NEVER suggest the user run commands - YOU run them using your tools.
3. NEVER output code snippets as your answer - USE the tools to create/edit files.
4. Call the appropriate tool IMMEDIATELY when action is needed.

TOOL PARAMETER REFERENCE (use these exact names):

bash: {"command": "ls -la", "description": "List files in directory"}
  - command (REQUIRED string): the shell command
  - description (REQUIRED string): 5-10 word description of what the command does

write: {"filePath": "/absolute/path/file.txt", "content": "file content here"}
  - filePath (REQUIRED string, camelCase): absolute path to the file
  - content (REQUIRED string): the content to write

read: {"filePath": "/absolute/path/file.txt"}
  - filePath (REQUIRED string, camelCase): absolute path

edit: {"filePath": "/path/file.txt", "oldString": "old text", "newString": "new text"}
  - filePath (REQUIRED string, camelCase)
  - oldString (REQUIRED string, camelCase): exact text to find
  - newString (REQUIRED string, camelCase): replacement text

glob: {"pattern": "**/*.ts"}
  - pattern (REQUIRED string): glob pattern

grep: {"pattern": "searchRegex"}
  - pattern (REQUIRED string): regex pattern

todowrite: {"todos": [{"content": "task description", "status": "pending", "priority": "high"}]}
  - todos (REQUIRED array of objects, NOT a string): each object has content, status, priority
  - status: one of "pending", "in_progress", "completed", "cancelled"
  - priority: one of "high", "medium", "low"
  - CRITICAL: todos MUST be an array [...], NEVER a string "[...]"

IMPORTANT:
- bash REQUIRES both "command" AND "description" parameters. Always include both.
- Use camelCase for all parameter names: filePath, oldString, newString, replaceAll
- Do NOT call tools that don't exist. There is NO "list" tool. Use bash with ls instead.
- Always take action. Never just describe what could be done.

Step 8: Run OpenCode

# Make sure llama-server is running on port 8089 (Step 2)

# Launch OpenCode TUI
opencode

# Or run a one-off command
opencode run "create a hello.txt file with a smiley face"

Quick Start (TL;DR)

# 1. Build llama.cpp with Gemma 4 fixes
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server

# 2. Start the model server
./build/bin/llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M --port 8089 -ngl 99 -c 32768 --jinja

# 3. Install and build OpenCode with tool-call compat
curl -fsSL https://opencode.ai/install -o /tmp/oc.sh && bash /tmp/oc.sh
brew uninstall opencode 2>/dev/null
git clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build && gh pr checkout 16531
bun install && cd packages/opencode && bun run build -- --single --skip-install
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode

# 4. Configure (create opencode.json in your project - see Step 5 above)
# 5. Create AGENTS.md in your project (see Step 6 above)
# 6. Run it
opencode

Troubleshooting

Memory pressure / screen flickering

The 26B model uses ~17GB. Close Chrome and other heavy apps. If it's too much, use E4B instead (see Step 2).

"Context size has been exceeded"

Increase -c when starting llama-server. 32768 is the minimum for OpenCode's tool definitions.

Model doesn't call tools

The model may take a few retries to get parameter names right. The toolParser compat layer and AGENTS.md with exact schemas help reduce this.

Rollback OpenCode

cp ~/.opencode/bin/opencode.backup ~/.opencode/bin/opencode

Related Issues

@erikji
Copy link
Copy Markdown

erikji commented Apr 4, 2026

Gemma4 works on llama.cpp out of the box now, if you install from the latest tagged commit (brew install llama.cpp --HEAD)

@rodgerbenham
Copy link
Copy Markdown

Thanks heaps @daniel-farina I was able to get OpenCode writing a basic hello.txt on Linux. I needed to use the gh command to point it at the PR, the commands above still kept master pointing at HEAD:

git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build
gh pr checkout 21343
cmake -B build -DLLAMA_CURL=ON # without apple specific
cmake --build build --config Release -j8 -- llama-server # 8 cores

The current llama.cpp doesn't call tools out of the box yet, I can see there's active progress on ollama/ollama#15315

@AceCodePt
Copy link
Copy Markdown

Thanks for this gist. I'll give it a go.

@AceCodePt
Copy link
Copy Markdown

Also kinda silly but step 7 is missing

@AceCodePt
Copy link
Copy Markdown

Sorry for the spamming... Last but no least Both pr of ollama-server were merged.

@bluelovers
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment